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Preface: Idiosyncratic 
and Collective Extreme Risks 


Modern western societies have a paradoxical relationship with risks. On the 
one hand, there is the utopian quest for a zero-risk society [120]. On the other 
hand, human activities may increase risks of all kinds, from collaterals of new 
technologies to global impacts on the planet. The characteristic multiplication 
of major risks in modern society and its reflexive impact on its development 
is at the core of the concept of the “Risk Society” [47]. Correlatively, our per- 
ception of risk has evolved so that catastrophic events (earthquakes, floods, 
droughts, storms, hurricanes, volcanic eruptions, and so on) are no more sys- 
tematically perceived as unfair outcomes of an implacable destiny. Catastro- 
phes may also result from our own technological developments whose com- 
plexity may engender major industrial disasters such as Bhopal, Chernobyl, 
AZT, as well as irreversible global changes such as global warming leading to 
climatic disruptions or epidemics from new bacterial and viral mutations. The 
proliferation of new sources of risks imposes new responsibilities concerning 
their determination, understanding, and management. Government organiza- 
tions as well as private institutions such as industrial companies, insurance 
companies, and banks which have to face such risks, in their role of regulators 
or of risk bearers, must ensure that the consequences of extreme risks are 
supportable without endangering the institutions in charge of bearing these 
risks. 

In the financial sector, crashes probably represent the most striking events 
among all possible extreme phenomena, with an impact and frequency that 
has been increasing in the last two decades [450]. Consider the worldwide 
crash in October 1987 which evaporated more than one thousand billion dol- 
lars in a few days or the more recent collapse of the internet bubble in which 
more than one-third of the world capitalization of 1999 disappeared after 
March 2000. Finance and stock markets are based on the fluid convertibility 
of stocks into money and vice versa. Thus, to work well, money is requested 
to be a reliable standard of value, that is, an effective store of value, hence the 
concerns with the negative impacts of inflation. Similarly, investors look at the 
various financial assets as carriers of value, like money, but with additional 
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return potentials (accompanied with downturn risks). But for this view to 
hold so as to promote economic development, fluctuations in values need to 
be tamed to minimize the risk of losing a lifetime of savings, or to avoid 
the risks of losing the investment potential of companies, or even to prevent 
economic and social recessions in whole countries (consider the situation of 
California after 2002 with a budget gap representing more than one-fourth of 
the entire State budget resulting essentially from the losses of financial and 
tax incomes following the collapse of the internet bubble). It is thus highly 
desirable to have the tools for monitoring, understanding, and limiting the ex- 
treme risks of financial markets. Fully aware of these problems, the worldwide 
banking organizations have promoted a series of advices and norms, known as 
the recommendations of the Basle committee [41, 42]. The Basle committee 
has proposed models for the internal management of risks and the imposi- 
tion of minimum margin requirements commensurate with the risk exposures. 
However, some criticisms [117, 467] have found these recommendations to be 
ill-adapted or even destabilizing. This controversy underlines the importance 
of a better understanding of extreme risks, of their consequences and ways to 
prevent or at least minimize them. 

In our opinion, tackling this challenging problem requires to decompose 
it into two main parts. First, it is essential to be able to accurately quan- 
tify extreme risks. This calls for the development of novel statistical tools 
going significantly beyond the Gaussian paradigm which underpins the stan- 
dard framework of classical financial theory inherited from Bachelier [26], 
Markowitz [347], and Black and Scholes [60] among others. Second, the ex- 
istence of extreme risks must be considered in the context of the practice 
of risk management itself, which leads to ask whether extreme risks can be 
diversified away similarly to standard risks according to the mean-variance 
approach. If the answer to this question is negative as can be surmized for nu- 
merous concrete empirical evidences, it is necessary to develop new concepts 
and tools for the construction of portfolios with minimum (but unavoidable) 
exposition of extreme risks. One can think of mixing equities and derivatives, 
as long as derivatives themselves do not add an extreme risk component and 
can really provide an insurance against extreme moves, which has been far 
from true in recent dramatic instances such as the crash of October 1987. 
Another approach could involve mutualism as in insurance. 

Risk management, and to the same extent portfolio management, thus re- 
quires a precise and rigorous analysis of the distribution of the returns of the 
portfolio of risks. Taking into account the moderate sizes of standard portfo- 
lios (from tens to thousands of assets typically) and the non-Gaussian nature 
of the distributions of the returns of assets constituting the portfolios, the 
distributions of the returns of typical portfolios are far from Gaussian, in con- 
tradiction with the expectation from a naive use of the central limit theorem 
(see for instance Chap. 2 of [451] and other chapters for a discussion of the 
deviations from the central limit theorem). This breakdown of universality 
then requires a careful estimation of the specific case-dependent distribution 


Preface: Idiosyncratic and Collective Extreme Risks IX 


of the returns of a given portfolio. This can be done directly using the time 
series of the returns of the portfolio for a given capital allocation. A more con- 
structive approach consists in estimating the joint distribution of the returns 
of all assets constituting the portfolio. The first approach is much simpler and 
rapid to implement since it requires solely the estimation of a monovariate 
distribution. However, it lacks generality and power by neglecting the observ- 
able information available from the basket of all returns of the assets. Only 
the multivariate distribution of the returns of the assets embodies the gen- 
eral information of all risk components and their dependence across assets. 
However, the two approaches become equivalent in the following sense: the 
knowledge of the distribution of the returns for all possible portfolios for all 
possible allocations of capital between assets is equivalent to the knowledge 
of the multivariate distributions of the asset returns. All things considered, 
the second approach appears preferable on a general basis and is the method 
mobilizing the largest efforts both in academia and in the private sector. 

However, the frontal attack aiming at the determination of the multivari- 
ate distribution of the asset returns is a challenging task and, in our opinion, 
much less instructive and useful than the separate studies of the marginal 
distributions of the asset returns on the one hand and the dependence struc- 
ture of these assets on the other hand. In this book, we emphasize this second 
approach, with the objective of characterizing as faithfully as possible the di- 
verse origins of risks: the risks stemming from each individual asset and the 
risks having a collective origin. This requires to determine (i) the distributions 
of returns at different time scales, or more generally, the stochastic process 
underlying the asset price dynamics, and (ii) the nature and properties of 
dependences between the different assets. 

The present book offers an original and systematic treatment of these two 
domains, focusing mainly on the concepts and tools that remain valid for 
large and extreme price moves. Its originality lies in detailed and thorough 
presentations of the state of the art on (i) the different distributions of finan- 
cial returns for various applications (VaR, stress testing), and (ii) the most 
important and useful measures of dependences, both unconditional and con- 
ditional and a study of the impact of conditioning on the size of large moves 
on the measure of extreme dependences. A large emphasis is thus put on the 
theory of copulas, their empirical testing and calibration, as they offer intrin- 
sic and complete measures of dependences. Many of the results presented here 
are novel and have not been published or have been recently obtained by the 
authors or their colleagues. We would like to acknowledge, in particular, the 
fruitful and inspiring discussions and collaborations with J.V. Andersen, U. 
Frisch, J.-P. Laurent, J.-F. Muzy, and V.F. Pisarenko. 

Chapter 1 describes a general framework to develop “coherent measures” of 
risks. It also addresses the origins of risks and of dependence between assets in 
financial markets, from the CAPM (capital asset pricing model) generalized to 
the non-Gaussian case with heterogeneous agents, the APT (arbitrage pricing 
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theory), the factor models to the complex system view suggesting an emergent 
nature for the risk-return trade-off. 

Chapter 2 addresses the problem of the precise estimation of the probabil- 
ity of extreme events, based on a description of the distribution of asset returns 
endowed with heavy tails. The challenge is thus to specify accurately these 
heavy tails, which are characterized by poor sampling (large events are rare). 
A major difficulty is to neither underestimate (Gaussian error) or overestimate 
(heavy tail hubris) the extreme events. The quest for a precise quantification 
opens the door to model errors, which can be partially circumvented by using 
several families of distributions whose detailed comparisons allow one to dis- 
cern the sources of uncertainty and errors. Chapter 2 thus discusses several 
classes of heavy tailed distributions: regularly varying distributions (7.e., with 
asymptotic power law tails), stretched-exponential distributions (also known 
as Weibull or subexponentials) as well as log-Weibull distributions which ex- 
trapolate smoothly between these different families. 

The second element of the construction of multivariate distributions of as- 
set returns, addressed in Chaps. 3-6, is to quantify the dependence structure 
of the asset returns. Indeed, large risks are not due solely to the heavy tails of 
the distribution of returns of individual assets but may result from a collective 
behavior. This collective behavior can be completely described by mathemat- 
ical objects called copulas, introduced in Chap. 3, which fully embody the 
dependence between asset returns. 

Chapter 4 describes synthetic measures of dependences, contrasting and 
linking them with the concept of copulas. It also presents an original estima- 
tion method of the coefficient of tail dependence, defined, roughly speaking, as 
the probability for an asset to lose a large amount knowing that another asset 
or the market has also dropped significantly. This tail dependence is of great 
interest because it addresses in a straightforward way the fundamental ques- 
tion whether extreme risks can be diversified away or not by aggregation in 
portfolios. Either the tail dependence coefficient is zero and the extreme losses 
occur asymptotically independently, which opens the possibility of diversify- 
ing them away. Alternatively, the tail dependence coefficient is non-zero and 
extreme losses are fundamentally dependent and it is impossible to completely 
remove extreme risks. The only remaining strategy is to develop portfolios that 
minimize the collective extreme risks, thus generalizing the mean-variance to 
a mean-extreme theory [332, 336, 333]. 

Chapter 5 presents the main methods for estimating copulas of financial 
assets. It shows that the empirical determination of a copula is quite delicate 
with significant risks of model errors, especially for extreme events. Specific 
studies of the extreme dependence are thus required. 

Chapter 6 presents a general and thorough discussion of different mea- 
sures of conditional dependences (where the condition can be on the size(s) 
of one or both returns for two assets). Chapter 6 thus sheds new light on the 
variations of the strength of dependence between assets as a function of the 
sizes of the analyzed events. As a startling concrete application of conditional 
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dependences, the phenomenon of contagion during financial crises is discussed 
in detail. 

Chapter 7 presents a synthesis of the six previous chapters and then offers 
suggestions for future work on dependence and risk analysis, including time- 
varying measures of extreme events, endogeneity versus exogeneity, regime 
switching, time-varying lagged dependence and so on. 

This book has been written with the ambition to be useful to (a) the 
student looking for a general and in-depth introduction to the field, (b) fi- 
nancial engineers, economists, econometricians, actuarial professionals and re- 
searchers, and mathematicians looking for a synoptic view comparing the pros 
and cons of different modeling strategies, and (c) quantitative practitioners 
for the insights offered on the subtleties and many dimensional components of 
both risk and dependence. The content of this book will also be useful to the 
broader scientific community in the natural sciences, interested in quantifying 
the complexity of many physical, geophysical, biophysical etc. processes, with 
a mounting emphasis on the role and importance of extreme phenomena and 
their non-standard dependences. 


Lyon, Nice and Los Angeles Yannick Malevergne 
August 2005 Didier Sornette 


An error does not become truth by 
reason of multiplied propagation, nor 
does truth become error because no- 
body sees it. 

M.K. Gandhi 
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On the Origin of Risks and Extremes 


1.1 The Multidimensional Nature of Risk 
and Dependence 


In finance, the fundamental variable is the return that an investor accrues from 
his investment in a basket of assets over a certain time period. In general, an 
investor is interested in maximizing his gains while minimizing uncertainties 
(“risks”) on the expected value of the returns on his investment, at possibly 
multiple time scales — depending upon the frequency with which the manager 
monitors the portfolio — and time periods — depending upon the investment 
horizon. From a general standpoint, the return-risk pair is the unavoidable du- 
ality underlying all human activities. The relationship between return and risk 
constitutes one of the most important unresolved questions in finance. This 
question permeates practically all financial engineering applications, and in 
particular the selection of investment portfolios. There is a general consensus 
among academic researchers that risk and return should be related, but the 
exact quantitative specification is still beyond our comprehension [414]. 

Uncertainties come in several forms, which we cite in the order of increasing 
aversion for most human beings: 


(i) stochastic occurrences of events quantified by known probabilities; 

(ii) stochastic occurrences of events with poorly quantified or unknown prob- 
abilities; 

(iii) random events that are “surprises,” 7.e., that were previously thought 
to be impossible or unthinkable until they happened and revealed their 
existence. 


Here we address the first form, using the mathematical tools of probability 
theory. 

Within this class of uncertainties, one must still distinguish several 
branches. In the simplest traditional theory exemplified by Markowitz [347], 
the uncertainties underlying a given set of positions (portfolio) result from 
the interplay of two components: risk and dependence. 
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(a) Risk is embedded in the amplitude of the fluctuations of the returns. its 
simplest traditional measure is the standard deviation (square-root of the 
variance). 

(b) The dependence between the different assets of a portfolio of positions 
is traditionally quantified by the correlations between the returns of all 
pairs of assets. 


Thus, in their most basic incarnations, both risk and dependence are thought 
of, respectively, as one-dimensional quantities: the standard deviation of 
the distribution of returns of a given asset and the correlation coefficient 
of these returns with those of another asset of reference (the “market” for 
instance). The standard deviation (or volatility) of portfolio returns provides 
the simplest way to quantify its fluctuations and is at the basis of Markowitz’s 
portfolio selection theory [347]. However, the standard deviation of a portfolio 
offers only a limited quantification of incurred risks (seen as the statistical fluc- 
tuations of the realized return around its expected — or anticipated — value). 
This is because the empirical distributions of returns have “fat tails” (see 
Chap. 2 and references therein), a phenomenon associated with the occur- 
rence of non-typical realizations of the returns. In addition, the dependences 
between assets are only imperfectly accounted for by the covariance matrix 
[309]. 
The last few decades have seen two important extensions. 


e First, it has become clear, as synthesized in Chap. 2, that the standard 
deviation offers only a reductive view of the genuine full set of risks em- 
bedded in the distribution of returns of a given asset. As distributions of 
returns are in general far from Gaussian laws, one needs more than one 
centered moment (the variance) to characterize them. In principle, an in- 
finite set of centered moments is required to faithfully characterize the 
potential for small all the way to extreme risks because, in general, large 
risks cannot be predicted from the knowledge of small risks quantified by 
the standard deviation. Alternatively, the full space of risks needs to be 
characterized by the full distribution function. It may also be that the dis- 
tributions are so heavy-tailed that moments do not exist beyond a finite 
order, which is the realm of asymptotic power law tails, of which the stable 
Lévy laws constitute an extreme class. The Value-at-Risk (VaR) [257] and 
many other measures of risks [19, 20, 73, 447, 453] have been developed to 
account for the larger moves allowed by non-Gaussian distributions and 
non-linear correlations. 

e Second and more recently, the correlation coefficient (and its associated 
covariance) has been shown to only be a partial measure of the full de- 
pendence structure between assets. Similarly to risks, a full understanding 
of the dependence between two or more assets requires, in principle, an 
infinite number of quantifiers or a complete dependence function such as 
the copulas, defined in Chap. 3. 
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These two fundamental extensions from one-dimensional measures of risk 
and dependence to infinitely dimensional measures of risk and dependence 
constitute the core of this book. Chapter 2 reviews our present knowledge 
and the open challenges in the characterization of distribution of returns. 
Chapter 3 introduces the notion of copulas which are applied later in Chap. 5 
to financial dependences. Chapter 4 describes the main properties of the most 
important and varied measures of dependence, and underlines their connec- 
tions with copulas. Finally, Chap. 6 expands on the best methods to capture 
the dependence between extreme returns. 

Understanding the risks of a portfolio of N assets involves the characteriza- 
tion of both the marginal distributions of asset returns and their dependence. 
In principle, this requires the knowledge of the full (time-dependent) mul- 
tivariate distribution of returns, which is the joint probability of any given 
realization of the N asset returns at a given time. This remark entails the 
two major problems of portfolio theory: (1) to determine the multivariate 
distribution function of asset returns; (2) to derive from it useful measures 
of portfolio risks and use them to analyze and optimize the performance of 
the portfolios. There is a large literature on multivariate distributions and 
multivariate statistical analysis [363, 468, 282]. This literature includes: 


the use of the multivariate normal distribution on density estimation [428]; 
the corresponding random vectors treated with matrix algebra, and thus 
on matrix methods and multivariate statistical analysis [173, 371]; 
the robust determination of multivariate means and covariances [297, 298]; 
the use of multivariate linear regression and factor models [160, 161]; 
principal component analysis, with excursions in clustering and classifica- 
tion techniques [276, 254]; 
methods for data analysis in cases with missing observations [133, 310]; 
detecting outliers [249, 250]; 
bootstrap methods and handling of multicollinearity [461]; 
methods of estimation using the plug-in principles and maximum likeli- 
hood [144]; 
hypothesis testing using likelihood ratio tests and permutation tests [398]; 
discrete multivariate distributions [253]; 
computer-aided geometric design, geometric modeling, geodesic applica- 
tions, and image analysis [464, 105, 426]; 
e radial basis functions [86], scattered data on spheres, and shift-invariant 
spaces [139, 433]; 
non-uniform spline wavelets [139]; 
scalable algorithms in computer graphics [76]; 
reverse engineering [139], and so on. 


The growing literature on (1) non-stationary processes [85, 210, 222, 361] 
and (2) regime-switching [172, 180, 215, 269] is not covered here. Nor do 
we address the more complex issues of embedding financial modeling within 
economics and social sciences. We do not cover either the consequences for risk 
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assessment coming from the important emerging field of behavioral finance, 
with its exploration of the impact on decision-making of imperfect bounded 
subjective probability perceptions [36, 206, 437, 439, 474]. Our book thus uses 
objective probabilities which can be estimated (with quantifiable errors) from 
suitable analysis of available data. 


1.2 How to Rank Risks Coherently? 


The question on how to rank risks, so as to make optimal decisions, is recur- 
rent in finance (and in many other fields) but has not yet received a general 
solution. 

Since the middle of the twentieth century, several paths have been ex- 
plored. The pioneering work by Von Neuman and Morgenstern [482] has given 
birth to the mathematical definition of the expected utility function, which 
provides interesting insights on the behavior of a rational economic agent 
and has formalized the concept of risk aversion. Based upon the properties 
of the utility function, Rothschild and Stiglitz [419, 420] have attempted to 
define the notion of increasing risks. But, as revealed by Allais [4, 5], em- 
pirical investigations have proven that the postulates chosen by Von Neuman 
and Morgenstern are actually often violated by humans. Many generalizations 
have been proposed for curing the so-called Allais’ Paradox, but until now, 
no generally accepted procedure has been found. 

Recently, a theory due to Artzner et al. {[19, 20] and its generalization by 
Follmer and Schied [174, 175] have appeared. Based on a series of postulates 
that are quite natural, this theory allows one to build coherent (resp., convex) 
measures of risks that provide tools to compare and rank risks [383]. In fact, 
if this theory seems well-adapted to the assessment of the needed economic 
capital, that is, of the fraction of capital a company must keep as risk-free 
assets in order to face its commitments and thus avoid ruin, it seems less 
natural for the purpose of quantifying the fluctuations of the asset returns 
or equivalently the deviation from a predetermined objective. In fact, as will 
be exposed in this section, it turns out that the two approaches consisting in 
assessing the risk in terms of economic capital on the one hand, and in terms 
of deviations from an objective on the other hand, are actually the two sides 
of the same coin as recently shown in [407, 408]. 


1.2.1 Coherent Measures of Risks 


According to Artzner et al. [19, 20], the risk involved in the variations of the 
values of a market position is measured by the amount of capital invested 
in a risk-free asset, such that the market position can be prolonged in the 
future. In other words, the potential losses should not endanger the future 
actions of the fund manager of the company, or more generally, of the person 
or structure which underwrites the position. In this sense, a risk measure 
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constitutes for Artzner et al. a measure of economic capital. The risk measure 
pcan be either positive, if the risk-free capital must be increased to guarantee 
the risky position, or negative, if the risk-free capital can be reduced without 
invalidating it. 

A risk measure is said to be coherent in the sense of Artzner et al. [19, 20] 
if it obeys the four properties or axioms that we now list. Let us call G the 
space of risks. If the space 9 of all possible states of nature is finite, G is 
isomorphic to RN and a risky position X is nothing but a vector in RN. A risk 
measure p is then a map from RN onto R. A generalization to other spaces G 
of risk has been proposed by Delbaen [123]. 

Let us consider a risky position with terminal value X and a capital a 
invested in the risk-free asset at the beginning of the time period. At the end 
of the time period, a becomes a- (1 + fo), where po is the risk-free interest 
rate. Then, 


Axiom 1 (Translational Invariance) 
VX EG and VaeER, p(X +a-(1+p0)) =p(X)-a. (1.1) 


This simply means that an investment of amount a in the risk-free asset 
decreases the risk by the same amount a. In particular, for any risky position 
X, p(X +p(X)-(1+1r)) = 0, which expresses that investing an amount p(X) in 
the risk-free asset enables one to exactly make up for the risk of the position 
XxX. 

Let us now consider two risky investments X, and X2, corresponding to 
the positions of two traders of an investment house. It is important for the 
supervisor that the aggregated risk of all traders be less than the sum of risks 
incurred by all traders. In particular, the risk associated with the position 
(X, + X2) should be smaller than or equal to the sum of the separated risks 
associated with the two positions X; and Xo. 


Axiom 2 (Sub-additivity) 
V(X1,X2)EGxG, — p(X1 + X2) < p(X1) + p(X2) . (1.2) 


The condition of sub-additivity encourages a portfolio managers to aggregate 
her different positions by diversification to minimize her overall risk. This 
axiom is probably the most debated among the four axioms underlying the 
theory of coherent measures of risk (see [131] and references therein). As an 
example, the VaR is well known to lack sub-additivity. At the same time, VaR, 
is comonotonically additive, which means that the VaR of two comonotonic 
assets equals the sum of the VaR of each individual asset. But, since the 
comonotonicity represents the strongest kind of dependence (see Chap. 3), 
it is particularly disturbing to imagine that one can find situations where a 
portfolio made of two comonotonic assets is less risky than a portfolio with 
assets whose marginal risks are the same as in the previous situation but with 
a weaker dependence. Here is the rub with sub-additivity. 
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Axiom 3 (Positive Homogeneity) 
VX €G and VA>0, P(A: X) = - p(X) . (1.3) 


This third axiom stresses the importance of homogeneity. Indeed, it means 
that the risk associated with a given position increases with its size, here 
proportionally with it. Again, this axiom is controversial. Obviously, one can 
assert that the risk associated with the position 2-X is naturally twice as 
large as the risk of X. This is true as long as we can consider that a large 
position can be cleared as easily as a smaller one. However, it is not realistic 
because of the limited liquidity of real markets; a large position in a given 
asset is more risky than the sum of the risks associated with the many smaller 
positions which add up to the large position. 

Eventually, if it is true that, for all possible states of nature, the risk of X 
leads to a loss larger than that of Y (i.e., all components of the vector X in 
RN are always less than or equal to those of the vector Y), the risk measure 
p(X) must be larger than or equal to p(Y) : 


Axiom 4 (Monotony) 
VX,Y EG suchthat X<Y,  p(X)>(Y). (1.4) 


These four axioms define the coherent measures of risks, which admit the 
following general representation: 


(X) = supE | = | (1.5) 
= su F : 
“ Sa Pli+ Lo 


where P denotes a family of probability measures. Thus, any coherent measure 
of risk appears as the expectation of the maximum loss over a given set of 
scenarios (the different probability measures P € P). It is then obvious that 
the larger the set of scenarios, the larger the value of p(X) and thus, the more 
conservative the risk measure. 

It is particularly interesting that expression (1.5) is very similar to the 
result obtained in the theory of utility with non-additive probabilities [202, 
203]. Indeed, in such a case, the utility of position X is given by 


U(X) = inf E xX 1.6 
(X) = inf Ep [u(X)], (1.6) 
where u(-) is a usual utility function. 

When the coherent risk measure is invariant in law and comonotonically 


additive, an alternative representation holds in terms of the spectral measure 
of risk [285, 471] 


p(X) =p | VaR.(X) dF(a) + (1 —p)VaRi(X) , (1.7) 
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where F is a continuous convex distribution function on [0,1], p is any real 
in [0,1] and VaR, is the Value-at-Risk defined in (3.85) page 125. Therefore, 
most coherent measures of risk appear as a convex sum of VaRq (a non- 
coherent risk measure) at different probability levels. The weighting function 
F can be interpreted as a distortion of the objective probabilities, as under- 
lined in the non-expected utility context [431, 495). 

Coherent measures of risk can be generalized to define the so-called con- 
vex measures of risk by replacing the controversial axioms 2-3, by a single 
axiom of convexity of the risk measure [174, 175]. In the case where the risk 
measure is still positively homogeneous, this requirement is equivalent to the 
sub-additivity, but it becomes less restrictive when Axiom 3 is discarded. 
Then, one obtains the following representation of the convex risk measures: 


p(X) = sup Ep | ee a @)| ; (1.8) 
PEM 

where M is the set of all probability measures on (2,7), F denotes a o- 

algebra on the state space 92. More generally, M is the set of all finitely 

additive and non-negative set functions P on F satisfying P(2) = 1 and the 

functional 


—X 
a(P) = sup Ep | | (1.9) 
xeg|p(x)<so LL + Ho 


is a penalty function that fully characterizes the convex measure of risk. In 

the case of a coherent risk measure, the set P (in (1.5)) is in fact the class of 

set functions P in M such that the penalty function vanishes: a(P) = 0. 
Another alternative leads one to replace Axiom 4 by the following: 


Axiom 5 (Expectation-Boundedness) 


E[-X] 
VX EG p(X)> Tt ho 


(1.10) 


where the equality holds if and only if X is certain.! Then, together with 
axioms 1-3, it allows one to define the expectation-bounded risk measures 
[407]. They are particularly interesting insofar as they enable one to capture 
the inherent relationship existing between the assessment of risk in terms of 
economic capital and the measure of risk in terms of deviations from a target 
objective, as we shall see hereafter. 


1.2.2 Consistent Measures of Risks and Deviation Measures 


We now present a slightly different approach, which we think offers a suitable 
complement to coherent (and/or convex) risk measures for financial invest- 
ments, and in particular for portfolio risk assessments. These measures are 


' We say that X is certain if X(w) = a, for some a € R, for all w € 2, such that 
P(w) #0, where P denotes a probability measure on (2, F) and F is a o-algebra 
so that (2,7,P) is a usual probability space. 
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called “consistent measures of risks” in [333] and “general deviation measures” 
n [407]. As before, we consider the future value of a risky position denoted 
by X, and we call G the space of risks. 

Let us first require that the risk measure ((-), which is a functional on G, 
always remains positive: 


Axiom 6 (Positivity) 
VX EG, A(X) >0, (1.11) 


where the equality holds if and only if X is certain. Let us now add to this 
position a given amount a invested in the risk-free asset whose return is jug 
(with therefore no randomness in its price trajectory) and define the future 
wealth of the new position Y = X + a(1+ po). Since po is non-random, 
the fluctuations of X and Y are the same. Thus, it is desirable that p en- 
joys a property of translational invariance, whatever X and the non-random 
coefficient a may be: 


VX EG, VaeER, P(X + (1+ po): a) = p(X). (1.12) 
This relation is obviously true for all oq and a; therefore, we set 


Axiom 7 (Translational Invariance) 
VX EG, VWKER, P(X +h) = p(X). (1.13) 


We also require that the risk measure increases with the quantity of assets 
held in the portfolio. This assumption reads 


VX EG, VAER,, a(\. X) = f(A): A(X), (1.14) 


where the function f : Ry —-> Ry, is increasing and convex to account for 
liquidity risk, as previously discussed. In fact, it is straightforward to show? 
that the only functions satisfying this requirement are the functions f¢(A) = 
MS with ¢ > 1, so that Axiom 3 can be reformulated in terms of positive 
homogeneity of degree ¢: 


Axiom 8 (Positive Homogeneity) 
VX EG, VAER,, p(A-X) = AS - p(X). (1.15) 


Note that the case of liquid markets is recovered by ¢ = 1 for which the risk is 
directly proportional to the size of the position, as in the case of the coherent 
risk measures. 

These axioms, which define the so-called consistent measures of risk [333] 
can easily be extended to the risk measures associated with the return on the 


* Using the trek Oa Ae-X) = Fa) A(—-X) = £0): Fa) (8) = 100-%8) 520 
leading to f(A1 - A2) = f(A1) - f(a). The unique increasing convex solution of 
this functional equation is f¢(A) = AS with ¢ > 1. 
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risky position. Indeed, a one-period return is nothing but the variation of the 
value of the position divided by its initial value Xo. One can thus easily check 
that the risk defined on the risky position is [Xo] times the risk defined on 
the return distribution. In the following, we will only consider the risk defined 
on the return distribution and, to simplify the notations, the symbol X will be 
used to denote both the asset price and its return in their respective context 
without ambiguity. 

Now, restricting to the case of a perfectly liquid market (¢ = 1) and adding 
a sub-additivity assumption 


Axiom 9 (Sub-additivity) 
WUXY)EGKG, p(X +Y) SAX) + A(X), (1.16) 


one obtains the so-called general deviation measures [407]. Again, this axiom is 
open to controversy and its main raison d’étre is to ensure the well-posedness 
of optimization problems (such as minimizing portfolio risks). It could be 
weakened along the lines used previously to derive the convex measures of 
risk from the coherent measures of risk. 

One can easily check that the deviation measures defined in (1.16) cor- 
respond one-to-one to the expectation-bounded measures of risk defined in 
(1.10) through the relation 


E[-X] 


o(X) = aX) + 


es A(X) = p(X PEER). (1.17) 


It follows straightforwardly that minimizing the risk of a portfolio (measured 
either by p or by #) under constraints on the expected return is equivalent, 
as long as the constraints on the expected return are active. Indeed, in such a 
case, searching for the minimum of 9 or of p(X) + rcs | is the same problem 
since the value of the expected return is fixed by the constraints. 

Additionally, it can be shown that the expectation-bounded measure of 
risk p defined by (1.17) is coherent if (and only if) the deviation measure p 
satisfies [407] 


WXCeG; p(X) <E[X]—inf X. (1.18) 


The general representation of the deviation measures satisfying this restric- 
tion can be easily derived from the representation of coherent risk measures. 
When such a requirement is not fulfilled, one can still have the following rep- 
resentation:? 


3 Strictly speaking, this representation only holds for lower semicontinous deviation 
measures, 7.e., deviation measures such that the sets {X|6(X) < ¢} are closed in 
£°(Q), for all € > 0. This condition is fulfilled by most of the deviation measures 
of common use: the standard deviation, the semi-standard deviation, the absolute 
deviation, and so on. 
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p(X) = sup E[Y - (E[X] — X)] = sup Cov(—X,Y) , (1.19) 
Yey Yey 


where ) is a closed and convex subset of £7(2) such that 
Prey, 
2vYey, E[Y]=1, 
3. WX € £2(Q), AY © Y, such that E[Y - X] < E[X]. 


When the random variables in ¥ are all positive, they can be interpreted as 
density functions relative to some reference probability measure Pp on (2, F) 
(the objective probability measure). Thus, the term E[Y - X] is nothing but 
the expectation of X under the probability measure P, such that its Radon 
density we = Y. Therefore, one obtains a deviation measure associated with 
a coherent measure of risk. 

These derivations show that deviation measures of risk on the one hand and 
coherent (or convex/expectation-bounded) measures of risk on the other hand 
are inextricably entangled. In fact, they are the two sides of the same coin, 
as mentioned in the introduction to this section. The various representation 
theorems show that, in most cases, these risk measures can be interpreted as 
worst-case scenarios, which rationalizes the use of stress-testing procedures as 
a sound practice for risk management. 

In the more general case when the exponent ¢ defined in Axiom 8 is no 
more equal to 1, and more precisely, when we only require that Axioms 6-8 
hold, there is no general representation for the consistent risk measures to 
the best of our knowledge. The risk measures f obeying Axioms 7 and 8 are 
known as the semi-invariants of the distribution of returns of X (see [465, 
pp. 86-87]). Among the large family of semi-invariants, we can cite the well- 
known centered moments and cumulants of X (including the usual variance). 
They are interesting cases that we discuss further below. 


1.2.3 Examples of Consistent Measures of Risk 


The set of risk measures obeying Axioms 7-8 is huge since it includes all the 
homogeneous functionals of (X — E[X]), for instance. The centered moments 
(or moments about the mean) and the cumulants are two well-known classes 
of semi-invariants. Then, a given value of ¢ can be seen as nothing but a 
specific choice of the order n of the centered moments or of the cumulants.* 
In this case, the risk measure defined via these semi-invariants fulfills the two 
following conditions: 


A(X + p) = A(X), (1.20) 
p(A-X) =X" - p(X). (1.21) 


* The relevance of the moments of high order for the assessment of large risks is 
discussed in Appendix 1.A. 
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In order to satisfy the positivity condition (Axiom 6), one needs to restrict 
the set of values taken by n. By construction, the centered moments of even 
order are always positive while the odd order centered moments can be neg- 
ative. In addition, a vanishing value of an odd order moment does not mean 
that the random variable, or risk, X € G is certain in the sense of footnote 1, 
since for instance any symmetric random variable has vanishing odd order 
moments. Thus, only the even-order centered moments seem acceptable risk 
measures. However, this restrictive constraint can be relaxed by first recalling 
that, given any homogeneous function f(-) of order p, the function f(-)% is 
also homogeneous of order p- q. This allows one to decouple the order of the 
moments to consider, which quantifies the impact of the large fluctuations, 
from the influence of the size of the positions held, measured by the degree 
of homogeneity of the measure p. Thus, considering any even-order centered 


moments, we can build a risk measure p(X) = E [(X — E[X])?"] cee which 
accounts for the fluctuations measured by the centered moment of order 2n 
but with a degree of homogeneity equal to ¢. 

A further generalization is possible for odd-order moments. Indeed, the 
absolute centered moments satisfy the three Axioms 6-8 for any odd or even 
order. So, we can even go one step further and use non-integer order absolute 


centered moments, and define the more general risk measure 
A(X) = B[X — ELT”, (1.22) 


where y denotes any positive real number. 
Due to the Minkowski inequality, these risk measures are convex for any 
¢ and y larger than 1 (and for0<u<1): 


plu: X +(1—u)-¥) <u- A(X) +(1—w)- AY), (1.23) 


which ensures that aggregating two risky assets diversifies their risk. In fact, 
in the special case y = 1, these measures enjoy the stronger sub-additivity 
property, and therefore belong to the class of general deviation measures. 

More generally, any discrete or continuous (positive) sum of these risk 
measures with the same degree of homogeneity is again a risk measure. 
This allows us to define “spectral measures of fluctuations” in the spirit of 
Acerbi [2]: 


B(X) = / dy o(y) EIX —ELXI”, (1.24) 


where ¢ is a positive real-valued function defined on any subinterval of [1, 00), 
such that the integral in (1.24) remains finite. It is sufficient to restrict the 
construction of #(X) to normalized functions ¢, such that [dy ¢(7) = 1, 
since the risk measures are defined up to a global normalization factor. Then, 
(7) represents the relative weight of the fluctuations measured by a given 
moment order and can be considered as a measure of the risk aversion of the 
risk manager with respect to large fluctuations. 
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The situation is not so clear for the cumulants, since the even-order cumu- 
lants, as well as the odd-order ones, can be negative (even if, for a large class 
of distributions, even-order cumulants remain positive, especially for fat-tailed 
distributions — even though there are simple but somewhat artificial counter- 
examples). In addition, cumulants suffer from another problem with respect 
to the positivity axiom. As for the odd-order centered moments, they can 
vanish even when the random variable is not certain. Just think of the cu- 
mulants of the Gaussian law. All but the first two (which represent the mean 
and the variance) are equal to zero. Thus, the strict formulation of the posi- 
tivity axiom cannot be fulfilled by the cumulants. Should we thus reject them 
as useful measures of risks? It is important to emphasize that the cumulants 
enjoy a property which can be considered as a natural requirement for a risk 
measure. It can be desirable that the risk associated with a portfolio made of 
independent assets is exactly the sum of the risk associated with each individ- 
ual asset. Thus, given N independent assets {X,,..., Xj}, and the portfolio 
Sn = X,+---+ Xn, we would like to have 


P(Sw) = (Xr) +--+ + A(Xw) - (1.25) 


This property is verified for all cumulants, while it does not hold for centered 
moments excepted the variance. In addition, as seen from their definition in 
terms of the characteristic function 


E [ei *] = exp & re) ; (1.26) 


n=1 


cumulants C,, of order larger than 2 quantify deviations from the Gaussian law 
and therefore measure large risks beyond the variance (equal to the second- 
order cumulant). 

What are the implications of using the cumulants as almost consistent 
measures of risks? In particular, what are the implications on the preferences 
of the agents employing such measures? To address this question, it is infor- 
mative to express the cumulants as a function of the centered moments. For 
instance, let us consider the fourth-order cumulant: 


C4 = pa — 3+ po” = pa —3- CQ? , (1.27) 


where [4, is the centered moment of order n. An agent assessing the fluctua- 
tions of an asset with respect to Cy exhibits an aversion for the fluctuations 
quantified by the fourth central moment ju4 — since Cy increases with ju4 — but 
is attracted by the fluctuations measured by the variance — since C4 decreases 
with pg. This behavior is not irrational because it remains globally risk-averse. 
Indeed, it depicts an agent which tries to avoid the larger risks but is ready to 
accept the smallest ones. This kind of behavior is characteristic of any agent 
using the cumulants as risk measures. In such a case, having C4 = 0 does not 
mean that the agent considers that the position is not risky (in the sense that 
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the position is certain) but that the agent is indifferent between the large risks 
of this position measured by p4 and the small risks quantified by p2. 

To summarize, centered moments of even orders possess all the minimal 
properties required for a suitable portfolio risk measure. Cumulants only par- 
tially fulfill these requirements, but have an additional advantage compared 
with the centered moments, that is, they fulfill the condition (1.25). For these 
reasons, we think it is interesting to consider both the centered moments and 
the cumulants in risk analysis and decision making. Finally let us stress that 
the variance, originally used in Markowitz’s portfolio theory [347], is nothing 
but the second centered moment, also equal to the second-order cumulant (the 
three first cumulants and centered moments are equal). Therefore, a portfo- 
lio theory based on the centered moments or on the cumulants automatically 
contains Markowitz’s theory as a special case, and thus offers a natural gen- 
eralization encompassing large risks of this masterpiece of financial science. It 
also embodies several other generalizations where homogeneous measures of 
risks are considered, as for instance in [241]. 

We should also mention the measure of attractiveness for risky invest- 
ments, the gain—loss ratio, introduced by Bernardo and Ledoit [50]. The gain 
(loss) of a portfolio is the expectation, under a benchmark risk-adjusted prob- 
ability measure, of the positive (negative) part of the portfolio’s excess payoff. 
The gain—loss ratio constitutes an improvement over the widely used Sharpe 
ratio (average return over volatility). The advantage of the gain—loss ratio is 
that it penalizes only downside risk (losses) and rewards all upside potential 
(gains). The gain-loss ratio has been show to yield useful bounds for asset 
pricing in incomplete markets that gives the modeler the flexibility to control 
the trade-off between the precision of equilibrium models and the credibility 
of no-arbitrage methods. The gain—loss approach is valuable in applications 
where the security returns are not normally distributed. Bernardo and Ledoit 
[50] cite the following domains of application: (i) valuing real options on non- 
traded assets; (ii) valuing executive stock options when the executive cannot 
trade the options or the underlying due to insider trading restrictions; (iii) 
evaluating the performance of portfolio managers who invest in derivatives; 
(iv) pricing options on a security whose price follows a jump-diffusion or a fat- 
tailed Pareto—Levy diffusion process; and (v) pricing fixed income derivatives 
in the presence of default risk. 


1.3 Origin of Risk and Dependence 
1.3.1 The CAPM View 


Our purpose is not to review the huge literature on the origin of risks and their 
underlying mechanisms, but to suggest guidelines for further understanding. 
For enticing introductions and synopses, we refer to the very readable books of 
Bernstein [51, 52]. In [51], Bernstein reviews the history, since ancient times, 
of those thinkers who showed how to quantify risk: 
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The capacity to manage risk, and with it the appetite to take risk 
and make forward-looking choices, are key elements |...] that drive the 
economic system forward. 


The concept of risks in economics and finance is elaborated in [52], starting 
with the origins of the Cowles foundation as the consequence of Cowles’s 
personal interest in the question: Are stock prices predictable? In the words 
of J.L. McCauley (see his customer review on www.amazon.com), 


this book is all about heroes and heroic ideas, and Bernstein’s heroes 
are Adam Smith, Bachelier, Cowles, Markowitz (and Roy), Sharpe, 
Arrow and Debreu, Samuelson, Fama, Tobin, Samuelson, Markowitz, 
Miller and Modigliani, Treynor, Samuelson, Osborne, Wells-Fargo 
Bank (McQuown, Vertin, Fouse and the origin of index funds), Ross, 
Black, Scholes, and Merton. The final heroes (see Chap. 14, The Ul 
timate Invention) are the inventors of (synthetic) portfolio insurance 
(replication/synthetic options). 


One of these achievements is the capital asset pricing model (CAPM), 
which is probably still the most widely used approach to relative asset val- 
uation, although its empirical roots have been found to be weaker in recent 
years [59, 160, 223, 287, 306, 401]. Its major idea was that priced risk cannot 
be diversified and cannot be eliminated through portfolio aggregation. This 
asset valuation model describing the relationship between expected risk and 
expected return for marketable assets is strongly entangled with the Mean- 
Variance Portfolio Model of Markowitz. Indeed, both of them fundamentally 
rely on the description of the probability distribution function (pdf) of as- 
set returns in terms of Gaussian functions. The mean-variance description is 
thus at the basis of the Markowitz portfolio theory and of the CAPM and its 
inter-temporal generalization (see for instance [359]). 

The CAPM is based on the concept of economic equilibrium between ra- 
tional expectation agents. Economic equilibrium is itself the self-organized 
result of complex nonlinear feedback processes between competitive inter- 
acting agents. Thus, while not describing the specific dynamics of how self- 
organization makes the economy converge to a stable regime [10, 18, 280], the 
concept of economic equilibrium describes the resulting state of this dynamic 
self-organization and embodies all the hidden and complex interactions be- 
tween agents with infinite loops of recurrence. This provides a reference base 
for understanding risks. 

We put some emphasis on the CAPM and its generalized versions because 
the CAPM is a remarkable starting point for answering the question on the 
origin of risks and returns: in economic equilibrium theory, the two are con- 
ceived as intrinsically entangled. In the following, we expand on this class of 
explanation before exploring briefly other directions. 

Let us now show how an equilibrium model generalizing the original CAPM 
(308, 364, 429] can be formulated on the basis of the coherence measures 
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adapted to large risks. This provides an “explanation” for risks from the 
point of view of the non-separable interplay between agents’ preferences and 
their collective organization. We should stress that many generalizations have 
already been proposed to account for the fat-tailness of the assets return dis- 
tributions, which led to the multimoments CAPM. For instance, Rubinstein 
[421], Krauss and Litzenberger [278], Lim [306] and Harvey and Siddique [223] 
have underlined and tested the role of the asymmetry in the risk premium 
by accounting for the skewness of the distribution of returns. More recently, 
Fang and Lai [162] and Hwang and Satchell [241] have introduced a four- 
moments CAPM to take into account the leptokurtic behavior of the assets 
return distributions. Many other extensions have been presented such as the 
VaR-CAPM [3] or the Distributional-CAPM [389]. All these generalizations 
become more complicated but unfortunately do not necessarily provide more 
accurate predictions of the expected returns. 

Let us assume that the relevant risk measure is given by any measure of 
fluctuations previously presented that obey the Axioms 6-8 of Sect. 1.2.2. 
We will also relax the usual assumption of a homogeneous market to give 
to the economic agents the choice of their own risk measure: some of them 
may choose a risk measure which puts the emphasis on the small fluctuations, 
while others may prefer those which account for the larger ones. In such an 
heterogeneous market, we will recall how an equilibrium can still be reached 
and why the excess returns of individual stocks remain proportional to the 
market excess return, which is the fundamental tenet of CAPM. 

For this, we need the following assumptions about the market: 


e H1: We consider a one-period market, such that all the positions held at 
the beginning of a period are cleared at the end of the same period. 

e H2: The market is perfect, i.e., there are no transaction costs or taxes, 
the market is efficient and the investors can lend and borrow at the same 
risk-free rate io. 


Of course, these standard assumptions are to be taken with a grain of salt 
and are made only with the goal of obtaining a normative reference theory. 
We will now add another assumption that specifies the behavior of the agents 
acting on the market, which will lead us to make the distinction between 
homogeneous and heterogeneous markets. 


Equilibrium in a Homogeneous Market 


The market is said to be homogeneous if all the agents acting on this market 
aim at fulfilling the same objective. This means that: 


e H3-1: All the agents want to maximize the expected return of their port- 
folio at the end of the period under a given constraint of measured risk, 
using the same measure of risks p¢ for all of them (the subscript ¢ refers 
to the degree of homogeneity of the risk measure, see Sect. 1.2). 
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In the special case where p¢ denotes the variance, all the agents follow a 
Markowitz’s optimization procedure, which leads to the CAPM equilibrium, 
as proved by Sharpe [429]. When pc represents the centered moments, this 
leads to the market equilibrium described in [421]. Thus, this approach allows 
for a generalization of the most popular asset pricing in equilibrium market 
models. 

When all the agents have the same risk function p¢, whatever ¢ may be, 
we can assert that they have all a fraction of their capital invested in the same 
portfolio IT (see, for instance [333] for the derivation of the composition of 
the portfolio), and the remaining in the risk-free asset. The amount of capital 
invested in the risky fund only depends on their risk aversion and/or on the 
legal margin requirement they have to fulfill. 

Let us now assume that the market is at equilibrium, 7.e., supply equals 
demand. In such a case, since the optimal portfolios can be any linear combi- 
nations of the risk-free asset and of the risky portfolio J7, it is straightforward 
to show that the market portfolio, made of all traded assets in proportion 
of their market capitalization, is nothing but the risky portfolio 17. Thus, as 
shown in [333], we can state that, whatever the risk measure p¢ chosen by 
the agents to perform their optimization, the excess return of any asset 7 over 
the risk-free interest rate (j(7) — 49) is proportional to the excess return of 
the market portfolio IT over the risk-free interest rate: 


u(t) — wo = BE- (ur — bo), (1.28) 
where 
—  Oln (0 ¢ ) 
Bi = oe (1.29) 
Wie Wy 
where w],...,wy are the optimal allocations of the assets in the following 
sense: 


infw,€[0,1) pc ({wit) 
so w; = 1 (1.30) 
In other words, the set of normalized weights w; define the portfolio with min- 
imum risk as measured by any convex? measure p¢ of risk obeying Axioms 6-8 
of Sect. 1.2.2 for a given amount of expected return p. 
When p¢ denotes the variance, we recover the usual 3‘ given by the mean- 
variance approach: 


a Cov(X;, IT) 


mean Cis oo 


° Convexity is necessary to ensure the existence and the unicity of a minimum. 
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Thus, the relations (1.28) and (1.29) generalize the usual CAPM formula, 
showing that the specific choice of the risk measure is not very important, 
as long as it follows the Axioms 6-8 characterizing the fluctuations of the 
distribution of asset returns. 


Equilibrium in a Heterogeneous Market 


Does this result hold in the more realistic situation of an heterogeneous mar- 
ket? A market will be said to be heterogeneous if the agents seek to fulfill 
different objectives. We thus consider the following assumption: 


e 8-2: There exist N agents. Each agent n is characterized by her choice of 
a risk measure p¢(n) so that she invests only in the mean-p¢(n) efficient 
portfolios. 


According to this hypothesis, an agent n invests a fraction of her wealth in 
the risk-free asset and the remaining in I7,,, the mean-p¢(n) efficient portfolio, 
only made of risky assets. Again, the fraction of wealth invested in the risky 
fund depends on the risk aversion of each agent, which may vary from one 
agent to another. 

The composition of the market portfolio IJ for such a heterogeneous mar- 
ket is found to be nothing but the weighted sum of the mean-p¢(n) optimal 
portfolio IT, [333]: 


N 
a> or iee (1.32) 
n=1 


where 7p, is the fraction of the total wealth invested in the fund JZ, by the 
n* agent. 

Moreover, for every asset i and for any mean-p¢(n) efficient portfolio ,, 
for all n, the following equation holds 


u(i) — Wo = Bn (Hem, — Ho) » (1.33) 
where 3’, is defined in (1.29). Multiplying these equations by 7,//3%,, we get 


Yn 
Br 


for all n, and summing over the different agents, we obtain 


( | (u(t) — Ho) = (> on) — flo (1.35) 
so that 


u(t) — Ho = 8° - (ar — bo) 5 (1.36) 


- (u(t) — Ho) = Yn (Hit, — Ho) 5 (1.34) 
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with 


-1 


This allows us to conclude that, even in a heterogeneous market, the expected 
excess return of each individual stock is directly proportional to the expected 
excess return of the market portfolio, showing that the homogeneity of the 
market is not required for observing a linear relationship between individual 
excess asset returns and the market excess return. 

The above calculations miss the possibility stressed by Rockafellar et al. 
[408] that two kinds of efficient portfolios 7, may exist in a heterogeneous 
market: long optimal portfolios which correspond to a net long position, and 
short optimal portfolios which correspond to a net short position. If the exis- 
tence of the second kind of portfolio is not compatible with an equilibrium in 
a homogeneous market,° their existence is not precluded in a heterogeneous 
market. Indeed, the net short positions of a certain class of agents can be 
compensated by the net long position of another class of agents. Thus, as long 
as a market portfolio IT corresponding to an overall long position exists, an 
equilibrium can be reached, and the results derived in this section still hold. 


1.3.2 The Arbitrage Pricing Theory (APT) 
and the Fama—French Factor Model 


The CAPM proposed a solution for what Roll [414] called 


perhaps the most important unresolved problem in finance, because 
it influences so many other problems, (which) is the relation between 
risk and return. Almost everyone agrees that there should be some 
relation, but its precise quantification has proven to be a conundrum 
that has haunted us for years, embarrassed us in print, and caused 
business practitioners to look askance at our scientific squabbling and 
question our relevance. 


Indeed, past and recent tests cast strong doubts on the validity of the CAPM. 
The recent Fama-French analysis [160] shows basically no support for the 
CAPM’s central result of a positive relation between expected return and 
global market risk (quantified by the so-called beta parameter). In contrast, 
other variables, such as market capitalization and the book-to-market ratio,” 
present some weak explanatory power. 


© An equilibrium cannot be reached if all investors want to sell stocks. 

” Ratio of the book value of a firm to its market value. Typically, the book-to- 
market is used to identify undervalued companies. If the book-to-market is less 
than one the stock is overvalued, while it is undervalued otherwise. 
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The Arbitrage Pricing Theory (APT) 


The empirical inadequacy of the CAPM has led to the development of more 
general models of risk and return, such as Ross’s Arbitrage Pricing Theory 
(APT) [418]. Quoting Sargent [427], 


Ross posited a particular statistical process for asset returns, then de- 
rived the restrictions on the process that are implied by the hypothesis 
that there exist no arbitrage possibilities. 


Like the CAPM, the APT assumes that only non-diversifiable risk is priced. 
But it differs from the CAPM by accounting for multiple causes of such risks 
and by assuming a sufficiently large number of such factors so that almost 
riskless portfolios can be constructed. Reisman recently presented a general- 
ization of the APT showing that, under the assumption that there exists no 
asymptotic arbitrage (7.e., in the limit of a large number of factors, the market 
risk can be decreased to almost zero), there exists an approximate multi-beta 
pricing relationship relative to any admissible proxy of dimension equal to the 
number of factors [402]. Unlike the CAPM which specifies returns as a linear 
function of only systematic risk, the APT is based on the well-known obser- 
vations that multiple factors affect the observed time series of returns, such as 
industry factors, interest rates, exchange rates, real output, the money sup- 
ply, aggregate consumption, investor confidence, oil prices, and many other 
variables [414]. However, while observed asset prices respond to a wide variety 
of factors, there is much weaker evidence that equities with larger sensitivity 
to some factors give higher returns, as the APT requires. 


The Fama—French Three Factor Model 


This empirical weakness in the APT has led to further generalizations of 
factor models, such as the Fama-French three-factor model [160], which does 
not use an arbitrage condition anymore. Fama and French started with the 
observation that two classes of stocks show better returns than the average 
market: (1) stocks with small market capitalization (“small caps”) and (2) 
stocks with a high book-value-to-price ratio (often “value” stocks as opposed 
to “growth” stocks). They added the overall market return to obtain the 
three factors: (i) the overall market return (Rm), (ii) the performance of small 
stocks relative to big stocks (SMB, small minus big), and (iii) the performance 
of value stocks relative to growth stocks (HML, high minus low). See the 
website of Professor K.R. French® which updates every quarter the benchmark 
factors and also presents the performance of several benchmark portfolios 
using different combinations of weights on the three factors. An important 
observation must be made concerning Fama and French’s approach to risk in 


8 http://mba.tuck.dartmouth. edu/pages/faculty/ken.french/data_library. 


html 
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their factor decomposition: they still see, as in the CAPM and APT, a large 
return as a reward for taking a high risk. For instance suppose that returns are 
found to increase with book/price. Then those stocks with a high book/price 
ratio must be more risky than average. This is in a sense the opposite to the 
traditional interpretation of a financial professional analyst, who would say 
that high book/price indicates a buying opportunity because the stock looks 
cheap. In contrast, according to the efficient market theory, a stock, which is 
cheap, can only be so because investors think it is risky. 

Actually, the relationship between return and risk is not automatically 
positive. Diether et al. [124] have recently documented that firms with more 
uncertain earnings (as measured by the dispersion of analysts’ forecasts) have 
smaller stock returns. As stressed by Johnson [255], this finding is important 
because it directly links asset returns with information, but the relation is 
apparently in contradiction with standard economic wisdom: the larger the 
risks, the smaller the return! Actually, Johnson proposes a simple explanation 
reconciling this new anomaly with the standard asset pricing theory, which is 
based on the following ingredients: (i) the equity value of the leveraged firm 
(i.e., with non-zero debt) is equivalent to a call option on the firm’s value, 
following Merton’s model of credit risk [358]; (ii) the dispersion of analysts’ 
forecasts is a measure of idiosyncratic risk, which is not priced. Then, by 
the Black—Merton—Scholes price for the equity-call option, the firm expected 
excess return (7.e., relative variation of the equity price) has its risk premium 
amplified by a factor reflecting the effective exposure of the equity price to 
the real firm value. This factor turns out to decrease with increasing volatility, 
because more unpriced risk raises the option value, which has the consequence 
of lowering its exposure to priced risks. It is important to stress that this 
effect increases with the firm leverage and vanishes if the firm has no debt, 
as verified empirically with impressive strength in [255]. This new anomaly is 
thus fundamentally due to the impact of the volatility in the option pricing 
of the firm equity value in the presence of debt, together with the existence 
of a non-priced component of volatility. 


1.3.3 The Efficient Market Hypothesis 


The efficient market hypothesis (EMH) has a long history in finance and offers 
a mechanism for the trade-off between risk and return [158, 159]. Similarly to 
the concept of economic equilibrium, it must be understood as the result of 
repetitive feedback interactions between investors, and thus provides a top— 
down answer to the question on the origin of risk and return. 


Origin of Possible Efficiency of Stock Markets 


Roll uses an illuminating biological analogy to explain the principle leading to 
the EMH, in terms of the model of the hawks and the doves [414], which has 
been introduced to illustrate the concept of an evolutionary stable equilibrium 
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(see also [118] for a seminal presentation of the concept of an evolutionary 
stable equilibrium in the genetic and biological context): 


Biologists note that competition for food results in a stable evolution- 
ary equilibrium characterized by multiple strategies. When competi- 
tors meet at a food site, they can either fight over the prize and risk 
injury — the “hawk” strategy — or withdraw and lose the food — the 
“dove” strategy. If every individual fights, a mutant who withdraws 
would eventually have a greater probability of procreating than the 
average fighter because of the risk of injury and the fact that only one 
fighter can win. (The dove occasionally finds uncontested food.) On 
the other hand, if every individual followed the dove strategy, a single 
fighter would gain a lot of food. The evolutionary equilibrium can be 
shown to involve either (a) part of the population always follows the 
hawk strategy and the complementary part follows the dove strategy 
or (b) every individual follows a randomized strategy, sometimes be- 
having as a hawk and sometimes as a dove. We can definitely rule out 
a world in which everyone follows the same fixed strategy. 

The analogy to market efficiency is immediate: investors compete for 
the most “undervalued” asset. The hawk strategy is conducting se- 
curity analysis. The dove strategy is passive investing: expending no 
effort on information analysis. Clearly, if everyone analyses securities, 
the benefits will be less than the costs. If everyone is passive, the 
benefits of analysis will be tremendous. The equilibrium is that some 
analyze, some don’t. Does it sound familiar? Note that the final equi- 
librium is characterized by a situation in which it is not worthwhile for 
the marginal passive investor to begin analyzing nor for the marginal 
active investor to cease conducting security analysis. 


The EMH is an idealization of a self-consistent dynamical state of the 
market resulting from the incessant actions of the traders (arbitragers). It 
is not the out-of-fashion equilibrium approximation sometimes described but 
rather embodies a very subtle cooperative organization of the market. A grow- 
ing number of academic studies and many practitioners have questioned the 
EMH on the basis of the non-rationality of individuals. Studies in psychology 
and behavioral sciences show indeed that people cannot be represented faith- 
fully by the Von Neumann/Morgenstern axioms of expected utility, especially 
in their limited intelligence, partial memory of the past and finite processing 
abilities, in their overconfidence and their biased assessments of probabilities 
[469]. However, interestingly, there are many works that demonstrate that 
“zero-intelligence” agents (to use the term of Farmer et al. [166]), who are 
very inefficient individually, often collectively provide efficient solutions. In- 
deed, the relevant question for understanding stock markets is not so much 
to focus on these irrationalities but rather to study how they aggregate in 
the complex, long-lasting, repetitive, and subtle environment of the market. 
This extension requires to abandon the emphasis on the description of the in- 
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dividual in favor of the search for emerging collective behaviors. Three fields 
of research highlight this idea and suggest a reconciliation, while enlarging 
significantly the perspective of the EMH. 


Collective Phenomena in Statistical Physics 


In statistical physics, the fight between order (through the interaction between 
elementary constituents of matter) and disorder (modeled by thermal fluctu- 
ations) gives rise to the spontaneous occurrence of “spontaneous symmetry 
breaking” also called phase transitions in this context [451]. The understand- 
ing of the large-scale organization as well as the sudden macroscopic changes 
of organization due to small variations of a control parameter has led to power- 
ful concepts such as “emergence” [9]: the macroscopic organization has many 
properties not shared by its constituents. For the market, this suggests that 
its overall properties can only be understood through the study of the trans- 
formation from the microscopic level of individual agents to the macroscopic 
level of the global market. In statistical physics, this can often be performed 
by the very powerful tool called the “renormalization group” [490, 489]. 


Collective Phenomena in Biological Systems 


Biology has clearly demonstrated that an organism has greater abilities than 
its constituent parts. This is true for multiorgan animals as well as for insect 
societies for instance (see E. O. Wilson’s book [488]). More recently, this has 
led to the concept of “swarm intelligence” [67, 68, 70, 135]: the collective be- 
haviors of (unsophisticated) agents interacting locally with their environment 
may cause coherent functional global patterns to emerge. Swarm intelligence 
is being used to obtain collective (or distributed) problem solving without cen- 
tralized control or the provision of a global model in many practical industrial 
applications [69]. The importance of evolution, competition, and ecologies to 
understand stock markets has been stressed by Farmer [164]. 


Collective Phenomena in Agent-Based Models 


Agent-based models (also called multi-agent games) are composed of collec- 
tions of synthetic, autonomous, interacting entities. They are used to explore 
how structure and interactions control the emergence of macroscopic behav- 
iors [24]. Ultimately, the goal is to produce faithful synthetic models of reality 
by capturing the salient structure and strategies of real agents. The so-called 
minority game is perhaps the simplest in the class of multi-agent games of 
interacting inductive agents with limited abilities competing for scarce re- 
sources. Many published works on minority game have motivated their study 
by their relevance to financial markets, because investors exhibit a large het- 
erogeneity of investment strategies, investment horizons, risk aversions and 
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wealth, and have limited resources and time to dedicate to novel strategies, 
and the minority mechanism is found in markets. For an introduction to the 
Minority Game see [92, 251] and the Web page on the Minority Game by D. 
Challet at www.unifr.ch/econophysics/minority /minority.html. An important 
outcome of this work is the discovery of different market regimes, depending 
on the value of a control parameter, roughly defined as the ratio of the num- 
ber of effective strategies available to agents divided by the total number of 
agents. In the minority game, agents choose their strategies according to the 
condition of the market so as on average to minimize their chance of being 
in the majority. When the “control” parameter is large, the recent history of 
the game contains some useful information that strategies can exploit and the 
market is not efficient. Below a critical value of the control parameter (7.e., 
for sufficiently many agents), reasonable measures of predictability suggest 
that the market is efficient and cannot be predicted. These two phases are 
characterized by different risks, which can be quantified as a function of the 
control parameter. However, even in the “efficient market” phase, large and 
extreme price moves occur, which may be preceded by distinct patterns that 
allow agents in some cases to forecast them [289, 7]. 


Self-Organization During Bubbles and Crashes 


A particular type of organization which requires special mention in this book 
is found in the occurrence of crashes. Market crashes exemplify in a dramatic 
way the spontaneous emergence of extreme events in self-organizing systems. 
Stock market crashes are indeed remarkable vehicles of important ideas needed 
to deal and cope with our risky world, as explained in [450]. By studying the 
frequency distribution of drawdowns, or runs of successive losses, Johansen 
and Sornette have shown that large financial crashes are “outliers” [249]: they 
form a class of their own which is characterized by its specific statistical signa- 
tures. An important consequence derives from this property: if large financial 
crashes are “outliers,” they are special and thus requires a special explanation, 
a specific model, a theory of their own. In addition, their special properties 
may perhaps be used for their prediction. The main mechanism at work in 
bubbles and then in their destabilization during crashes is the existence of pos- 
itive feedbacks, i.e., self-reinforcement. Positive feedbacks have many sources 
both technical and behavioral, a dominant one being imitative behavior and 
herding between investors [450], which has been associated with behavioral 
“irrational exuberance” [438]. Positive feedbacks provide the fuel for the de- 
velopment of speculative bubbles, preparing the instability for a major crash. 
The understanding of financial bubbles and crashes requires a synthesis be- 
tween the theory of collective behavior combined with the economic theory 
of anticipating agents who can change the future by their forecasts and the 
actions based on them. During a time of market instabilities, the tools of eco- 
nomic and financial theory break down; for instance, the idea of portfolio in- 
surance breaks down as no portfolio can be perfectly insured against extreme 
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deviations, especially those that occurred in October 1987 and wiped out 
confidence in the methods of so-called portfolio insurance of Leland-O’ Brien- 
Rubinstein Associates. Similarly, the assumptions of near-normal distributions 
and stable covariance broke down during the failure of LTMC (Long-Term 
Capital Management) in October 1998 [394]. 


1.3.4 Emergence of Dependence Structures 
in the Stock Markets 


Factors and Large Eigenvalues of Correlation Matrices 


As mentioned above, factor models are nowadays the approaches most often 
used for extracting regularities in and for explaining the vagaries of stock mar- 
ket prices. Factor models conceptually derive from and generalize the CAPM 
and APT models. Factors, which are often invoked to explain prices, are the 
overall market factor and the factors related to firm size, firm industry and 
book-to-market equity, thought to embody most of the relevant dependence 
structure between the studied time series [160, 161]. Indeed, there is no doubt 
that observed equity prices respond to a wide variety of unanticipated fac- 
tors, but there is much weaker evidence that expected returns are higher for 
equities that are more sensitive to these factors, as required by Markowitz’s 
mean-variance theory, by the CAPM and the APT [414]. This severe failure 
of the most fundamental finance theories could conceivably be attributed to 
an inappropriate proxy for the market portfolio, but nobody has been able 
to show that this is really the correct explanation. This remark constitutes 
the crux of the problem: the factors invoked to model the cross-sectional de- 
pendence between assets are not known in general and are either postulated 
based on the economic intuition in financial studies, or obtained as black-box 
results in the recent analyses using the random matrix theory to large finan- 
cial covariance matrices [392, 288]. In other words, explanatory factors emerge 
endogenously. 

Here, we follow [337] to show that the existence of factors have a natural 
bottom-up explanation: they can be seen to result from a collective effect of 
the assets, similar to the emergence of a macroscopic self-organization of in- 
teracting microscopic constituents. To show this, we unravel the origin of the 
large eigenvalues of large covariance and correlation matrices and provide a 
complete understanding of the coexistence of features resembling properties 
of random matrices and of large “anomalous” eigenvalues. The main insight 
here is that, in any large system possessing non-vanishing average correlations 
between a finite fraction of all pairs of elements, a self-organized macroscopic 
state generically exists. In other words, “explanatory” factors emerge endoge- 
nously. 


1.3 Origin of Risk and Dependence 25 
Derivation of the Largest Eigenvalues 


Let us first consider a large basket of N assets with correlation matrix C 
in which every non-diagonal pair of elements exhibits the same correlation 
coefficient Ci; = p for i A j and Cy = 1. Its eigenvalues are 


with multiplicity N — 1 and with p € (0,1) in order for the correlation ma- 
trix to remain positive definite. Thus, in the large size limit N — oo, even 
for a weak positive correlation p — 0 (with pN > 1), a very large eigen- 
value appears, associated with the “delocalized” (i.e., uniformly spread over 
all components) eigenvector v; = (1/VN)(1,1,--» ,1), which dominates com- 
pletely the correlation structure of the system. This trivial example stresses 
that the key point for the emergence of a large eigenvalue is not the strength 
of the correlations, provided that they do not vanish, but the large size N of 
the system. 

This result (1.38) still holds qualitatively when the correlation coefficients 
are all distinct. To see this, it is convenient to use a perturbation approach. 
We thus add a small random component to each correlation coefficient: 


Ciyy = pte: ai; for i#j 4 (1.39) 


where the coefficients aj; = aj; have zero mean, variance a” and are inde- 
pendently distributed (there are additional constraints on the support of the 
distribution of the a;;’s in order for the matrix Ci; to remain positive definite 
with probability one). The determination of the eigenvalues and eigenevectors 
of Ci; is performed using the perturbation theory up to the second order in 
e. We find that the largest eigenvalue satisfies 


(N-—1)(N-2) €?2o? 
N? p 


Eli] = (N—-1)p+14 + O(c) (1.40) 
while, at the same order, the corresponding eigenvector v1 remains unchanged. 
The degeneracy of the eigenvalue \ = 1 — p is broken and leads to a complex 
set of smaller eigenvalues described below. 

In fact, this result (1.40) can be generalized to the non-perturbative do- 
main of any correlation matrix with independent random coefficients Ci;, 
provided that they have the same mean value p and variance o?. Indeed, in 
such a case, the expectations of the largest and second largest eigenvalues are 
[180] 


ED] = (N-1)-p+1+07/p+o(1) , (1.41) 
E[Ag] < 20VN + O(N*3 log N) . (1.42) 


Moreover, the statistical fluctuations of these two largest eigenvalues are as- 
ymptotically (for large fluctuations t > O(/N)) bounded by a Gaussian term 
according to the following large deviation result 
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Pr{|A1,2 — E[aa]| >t} < eo , (1.43) 


for some positive constant c;,2 [279]. Numerical simulations of the distribution 
of eigenvalues of a random correlation matrix confirm indeed that the largest 
eigenvalue is indeed proportional to N, while the bulk of the eigenvalues are 
much smaller and are described by a modified semicircle law [357] centered 
on A= 1-— 9, in the limit of large N. 

This result is very different from that obtained when the mean value p 
vanishes. In such a case, the distribution of eigenvalues of the random matrix 
C is given by the semicircle law [357]. However, due to the presence of the ones 
on the main diagonal of the correlation matrix C, the center of the circle is not 
at the origin but at the point \ = 1. Thus, the distribution of the eigenvalues 
of random correlation matrices with zero mean correlation coefficients is a 
semicircle of radius 20VN centered at \ = 1. 

The result (1.41) is deeply related to the so-called friendship theorem 
in mathematical graph theory, which states that, in any finite graph such 
that any two vertices have exactly one common neighbor, there is one and 
only one vertex adjacent to all other vertices [155]. A more heuristic but 
equivalent statement is that, in a group of people such that any pair of persons 
have exactly one common friend, there is always one person (the “politician” ) 
who is the friend of everybody. Consider the matrix C' with its non-diagonal 
entries C;; ( # j) equal to Bernoulli random variable with parameter p, that 
is, Pr[(Ci; = 1] = p and Pr[C,; = 0] = 1 — p. Then, the matrix C;,; — J, 
where J is the unit matrix, becomes nothing but the adjacency matrix of the 
random graph G(N, p) [279]. The proof of [155] of the “friendship theorem” 
indeed relies on the N-dependence of the largest eigenvalue and on the VN- 
dependence of the second largest eigenvalue of Cj; as given by (1.41) and 
(1.42). 

Figure 1.1 shows the distribution of eigenvalues of a random correlation 
matrix. The inset shows the largest eigenvalue lying at the predicted size 
pN = 56.8, while the bulk of the eigenvalues are much smaller and are de- 
scribed by a modified semicircle law centered on A = 1 — p, in the limit of 
large N. This result, on the largest eigenvalue emerging from the collective 
effect of the cross-correlation between all N(.N — 1)/2 pairs, provides a novel 
perspective to the observation [40, 413] that the only reasonable explanation 
for the simultaneous crash of 23 stock markets worldwide in October 1987 
is the impact of a world market factor: according to the results (1.41) and 
(1.42) and the view expounded by Fig. 1.1, the simultaneous occurrence of 
significant correlations between the markets worldwide is bound to lead to 
the existence of an extremely large eigenvalue, the world market factor con- 
structed by... a linear combination of the 23 stock markets! What this result 
shows is that invoking factors to explain the cross-sectional structure of stock 
returns is cursed by the chicken-and-egg problem: factors exist because stocks 
are correlated; stocks are correlated because of common factors impacting 
them. 
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Fig. 1.1. Spectrum of eigenvalues of a random correlation matrix with average 
correlation coefficient p = 0.14 and standard deviation of the correlation coefficients 
o = 0.345 VN: the ordinate is the number of eigenvalues in a bin with value given by 
the abscissa. One observes that all eigenvalues except the largest one are smaller than 
or equal to © 1.5. The size N = 406 of the matrix is the same as in previous studies 
[392] for the sake of comparison. The continuous curve is the theoretical translated 
semicircle distribution of eigenvalues describing the bulk of the distribution which 
passes the Kolmogorov test. The center value \ = 1 — p ensures the conservation 
of the trace equal to N. There is no adjustable parameter. The inset represents 
the whole spectrum with the largest eigenvalue whose size is in agreement with the 
prediction pN = 56.8. Reproduced from [337] 


Generalization to a Segmented Market 
with Different Coupled Industries 


Empirically [392, 288], a few other eigenvalues below the largest one have an 
amplitude of the order of 5-10 that deviate significantly from the bulk of the 
distribution. The above analysis provides a very simple mechanism for them, 
justifying the postulated model in [373]. The solution consists in considering, 
as a first approximation, the block diagonal matrix C’ with diagonal ele- 
ments made of the matrices A;,--- , A, of sizes Nj,---,Np with > N; =N, 
constructed according to (1.39) such that each matrix A; has the average 
correlation coefficient p;. When the coefficients of the matrix C’ outside the 
matrices A; are zero, the spectrum of C’ is given by the union of all the spec- 
tra of the A,’s, which are each dominated by a large eigenvalue \1,; ~ p;- Ni. 
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The spectrum of C’ then exhibits p large eigenvalues. Each block A; can be 
interpreted as a sector of the economy, including all the companies belonging 
to a same industrial branch and the eigenvector associated with each largest 
eigenvalue represents the main factor driving this sector of activity [343, 349]. 
For similar sector sizes N; and average correlation coefficients p;, the largest 
eigenvalues are of the same order of magnitude. In addition, a very large 
unique eigenvalue is obtained by introducing some coupling constants outside 
the block diagonal matrices. A well-known result of the perturbation the- 
ory states that such coupling leads to a “repulsion” between the eigenvalues, 
which can be observed in Fig. 1.2 where C’ has been constructed with three 
block matrices A;, Ag, and Ags and non-zero off-diagonal coupling described 
in the figure caption. These values allow to quantitatively replicate the em- 
pirical finding of Laloux et al. in [392], where the three first eigenvalues are 
approximately A; ~ 57, Ag ~ 10 and A3 ~ 8. 
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Fig. 1.2. Spectrum of eigenvalues estimated from the sample correlation matrix 
of N = 406 time series of length T = 1309. The times series have been constructed 
from a multivariate Gaussian distribution with a correlation matrix made of three 
block-diagonal matrices of sizes respectively equal to 130, 140, and 136 and mean 
correlation coefficients equal to 0.18 for all of them. The off-diagonal elements are 
all equal to 0.1. The same results hold if the off-diagonal elements are random. The 
inset shows the existence of three large eigenvalues, which result from the three-block 
structure of the correlation matrix. Reproduced from [337] 
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Expressions (1.40,1.41) and numerical tests for a large variety of correlation 
matrices show that the equally weighted eigenvector v; = (1/VN)(1,1,...,1), 
associated with the largest eigenvalue is extremely robust and remains (on 
average) the same for any large system. Thus, even for time-varying correlation 
matrices, which is the result of heteroscedastic effects, the composition of 
the main factor remains almost the same. This can be seen as a generalized 
limit theorem reflecting the bottom-up organization of broadly correlated time 
series. 


1.3.5 Large Risks in Complex Systems 


These calculations show that an endogenous small positive correlation between 
all stock-pairs gives rise to large eigenvalues which can then be associated with 
“market factors.” It seems that earlier researches have promoted the other way 
around: existing market factors (stock indices, news agencies, etc.) introduce 
exogenous market impact which affect different stocks similarly, thereby in- 
troducing positive correlation and thus large eigenvalues. This is clear from 
the general formulation of (linear) factor models such as the CAPM, APT, 
and Fama-French approaches in which the returns of all stocks are regressed 
against the same set of factors. Actually, we propose that the two chains of 
cause and result may be intrinsically coupled: the correlation structure be- 
tween stocks is a stable attractor of a self-organized dynamics with positive 
and negative feedbacks in which factors exist because correlations exist, and 
correlations exist because factors exist. It would suggest the development of 
dynamical factor models, in which agents form anticipations on correlations 
based on their calibration of the past behavior of the regression to factors, 
in order to study the possible types of attractors (single or multiple equilib- 
ria) in the correlation structure of stocks. This may cast new light on the 
major unsolved problem stated in the introduction of this chapter concerning 
the relationship between return and risks: perhaps, the concept of return as 
the remuneration of risk which is so fundamental in financial theory should 
be replaced by the concept of the emergence of the risk-return duality, in 
which their relationship can be negative or positive, depending upon circum- 
stances that remain to be worked out. Moreover, simulations of complex self- 
organizing systems show that large fluctuations and extreme variations are 
the rule rather than the exception. 

The complex system approach, which involves seeing interconnections and 
relationships, 7.e., the whole picture as well as the component parts, is nowa- 
days pervasive in modern control of engineering devices and business manage- 
ment. A central property of a complex system is the possible occurrence of 
coherent large-scale collective behaviors with a very rich structure, resulting 
from the repeated non-linear interactions among its constituents: the whole 
turns out to be much more than the sum of its parts. Most complex systems 
around us do exhibit rare and sudden transitions that occur over time in- 
tervals that are short compared with the characteristic time scales of their 
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posterior evolution. Such extreme events express more than anything else the 
underlying forces usually hidden by almost perfect balance and thus pro- 
vide the potential for a better scientific understanding of complex systems. 
These crises have fundamental societal impacts and range from large nat- 
ural catastrophes, catastrophic events of environmental degradation, to the 
failure of engineering structures, crashes in the stock market, social unrest 
leading to large-scale strikes and upheaval, economic drawdowns on national 
and global scales, regional power blackouts, traffic gridlocks, diseases and epi- 
demics, etc. An outstanding scientific question is how such large-scale patterns 
of catastrophic nature might evolve from a series of interactions on the small- 
est and increasingly larger scales. In complex systems, it has been found that 
the organization of spatial and temporal correlations do not stem, in general, 
from a nucleation phase diffusing across the system. It results rather from a 
progressive and more global cooperative process occurring over the whole sys- 
tem by repetitive interactions, which is partially described by the distributed 
correlations at the origin of a large eigenvalue as described above. An instance 
would be the many occurrences of simultaneous scientific and technical discov- 
eries signaling the global nature of the maturing process. Recent developments 
suggest that non-traditional approaches, based on the concepts and methods 
of statistical and nonlinear physics coupled with ideas and tools from com- 
putation intelligence could provide novel methods in complexity to direct the 
numerical resolution of more realistic models and the identification of rele- 
vant signatures of large and extreme risks. To address the challenge posed by 
the identification and modeling of such outliers, the available theoretical tools 
comprise in particular bifurcation and catastrophe theories, dynamical critical 
phenomena and the renormalization group, nonlinear dynamical systems, and 
the theory of partially (spontaneously or not) broken symmetries. This field 
of research is presently very active and is expected to advance significantly 
our understanding, quantification, and control of risks. 

In the mean time, both practitioners and academics need reliable metrics 
to characterize risks and dependences. This is the purpose of the following 
chapters, which expose powerful models and measures of large risks and com- 
plex dependences between time series. 


Appendix 
1.4 Why Do Higher Moments Allow us to Assess Larger Risks? 


As asserted in the main body of this chapter, the complete description of 
the fluctuations of an asset or a portfolio at a fixed time scale is given by 
the knowledge of the probability density function (pdf) of its return. The pdf 
encompasses all the risk dimensions associated with this asset. Unfortunately, 
it is impossible to classify or order the risks described by the entire pdf, 
except in special cases where the concept of stochastic dominance applies. 
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Therefore, the whole pdf cannot provide an adequate measure of risk, which 
should be embodied by a single variable. In order to perform a selection among 
a basket of assets and construct optimal portfolios, one needs measures given 
as real numbers, not functions, which can be ordered according to the natural 
ordering of real numbers on the line. 

In this vein, Markowitz [347] has proposed to summarize the risk of an 
asset by the variance of its returns (or equivalently by the corresponding 
standard deviation). It is clear that this description of risks is fully satisfying 
only for assets with Gaussian pdfs. In any other case, the variance generally 
provides a very poor estimate of the real risk. Indeed, it is a well-established 
empirical fact that the pdfs of asset returns have fat tails (see Chap. 2), 
so that the Gaussian approximation underestimates significantly the large 
price movements frequently observed on stock markets (see Fig. 2.1). Conse- 
quently, the variance cannot be taken as a suitable measure of risks, since it 
only accounts for the smallest contributions to the fluctuations of the asset’s 
returns. 

The variance of the return X of an asset involves its second moment E[X?| 
and, more precisely, is equal to its second centered moment (or moment about 
the mean) E es —E[X])’|. Thus, the weight of a given fluctuation X con- 
tributing to the variance of the returns is proportional to its square. Due to 
the decay of the pdf of X for large X bounded from above by ~ 1/|X|'*# with 
jt > 2 (see Chap. 2), the largest fluctuations do not contribute significantly to 
this expectation. To increase their contributions, and in this way to account 
for the largest fluctuations, it is natural to invoke moments of order n higher 


P(x) 


xn 


Fig. 1.3. This figure represents the function 2” -e~” for n = 1,2, and 4 and shows 
the typical size of the fluctuations involved in the moment of order n. Reproduced 
from [333] 
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than 2. The larger n is, the larger is the contribution of the rare and large 
returns in the tail of the pdf. This phenomenon is demonstrated in Fig. 1.3, 
where we can observe the evolution of the quantity x”: f(x) for n = 1,2, and 4, 
where f(x), in this example, denotes the density of the standard exponential 
distribution e~*. The expectation E[X”] is then simply represented geomet- 
rically as equal to the area below the curve x” - f(x). These curves provide an 
intuitive illustration of the fact that the main contributions to the moment 
E[X”] of order n come from values of X in the vicinity of the maximum of 
x” - f(x), which increases fast with the order n of the moment we consider, 
all the more so, the fatter is the tail of the pdf of the returns X. In addition, 
the typical size of the return assessed by the moment of order n is given by 
An = E[X pill (which coincide with the L, norm of X, for positive random 
variables). For the exponential distribution chosen to construct Fig. 1.3, the 
value of 2 corresponding to the maximum of 2” - f(a) is exactly equal to n, 
while \,, = 2 +O(Inn). Thus, increasing the order of the moment allows one 
to sample larger fluctuations of the asset prices. 


2 


Marginal Distributions of Returns 


2.1 Motivations 


As discussed in Chap. 1, the risks of a portfolio of N assets are fully charac- 
terized by the (possibly time-dependent) multivariate distribution of returns, 
which is the joint probability of any given realization of the N asset returns. 
For Gaussian models, this requires only the estimation of the average returns 
and of their covariance matrix. However, there is no doubt anymore that the 
Gaussian model is an inadequate description of real financial data (see for 
instance Fig. 2.1): the tails of the distributions are much fatter than Gaussian 
and the dependence between assets is not fully captured by the sole covariance 
matrix. The calibration and tests of multivariate models as well as their use 
for derivative pricing, portfolio analysis, and optimization are thus daunting 
tasks, characterized by the “curse of dimensionality.” 

The present book is constructed upon the foundation offered by the math- 
ematical theory of copulas: as shown in Chap. 3 and used in subsequent chap- 
ters, any multivariate distribution can be uniquely decomposed into a part 
(the copula) capturing the intrinsic dependence between the assets and an- 
other part quantifying the risks embodied in the marginal distributions. In 
this representation, the information contained in a multivariate distribution 
of asset returns is thus decomposed in two sets: the intrinsic dependence and 
the marginals. Portfolio risks result from the multivariate composition of both 
the risks embedded in the marginals and the risks due to dependence, as well- 
known since Markowitz’s mean-variance portfolio theory. Diversification of 
risks may then result from two mechanisms (working independently or in con- 
junction): (i) the law of large numbers (the larger the number of assets, the 
smaller the relative amplitude of the fluctuations of the total value relative to 
its mean) and (ii) anticorrelations (two assets whose prices tend to move in 
opposite directions give a lower risk when combined in a portfolio). The former 
mechanism often dominates in large portfolios while the second mechanism is 
at the basis of derivative hedging. 
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Fig. 2.1. Bivariate distribution of the daily annualized returns r; of the Swiss franc 
(CHF) in US $ (é = 1) and of the Japanese yen (JPY) in US $ (i = 2) for the time 
interval from January 1971 to October 1998. One-fourth of the data points are rep- 
resented for clarity of the figure. The contour lines are obtained by smoothing the 
empirical bivariate density distribution and represent equilevels. The outer (respec- 
tively middle and inner) line is such that 90% (respectively 50% and 10%) of the 
total number of data points fall within it. It is apparent that the data is not described 
by an elliptically contoured pdf as it should be if the dependence was prescribed by 
a Gaussian (or more generally by an elliptic) distribution. Instead, the contour line 
takes the shape of a “bean”. Also shown are the price-time series and the marginal 
distributions (in log-linear scales) in the panels at the top and on the side. The 
parabolas in thick lines correspond to the best fits by Gaussian distributions. The 
thin lines correspond to the best fits by stretched exponentials ~ exp[—(ri/roi)“] 
with exponents c; = 1.14 for CHF and cz = 0.8 for JPY. Reproduced from [457] 


In the present chapter, we review the knowledge accumulated on the char- 
acterization of marginal distributions of asset returns. This knowledge com- 
bined with adequate representations of the dependence structure between as- 
sets described in the following chapters can then be used to fully define the 
multivariate risks. The present chapter thus reviews the bricks of individual 


2.1 Motivations 35 


asset risks which can then be combined with the help of copulas to build the 
multivariate risk edifice. The emphasis is put on the determination of the pre- 
cise shape of the tail of the distribution of returns of a given asset, which is a 
major issue both from a practical and from an academic point of view. Indeed, 
for practitioners, it is crucial to accurately estimate the high and low quan- 
tiles of the distribution of returns (profit and loss) because they are involved 
in almost all the modern risk management methods while from an academic 
perspective, many economic and financial theories rely on a specific parame- 
terization of the distributions whose parameters are intended to represent the 
“macrovariables” influencing the agents. 

For the purpose of practical market risk management, one typically needs 
to assess tail risks associated with the distribution of returns or profit and 
losses. Following the recommendations of the BIS,‘ one has to focus on risks 
associated with positions held for 10 days. Therefore, this requires to consider 
the distributions of 10-day returns. However, at such a large time scale, the 
number of (non-overlapping) historical observations dramatically decreases. 
Even over a century, one can only collect 2500 data points, or so, per asset. 
Therefore, the assessment of risks associated with high quantiles is particularly 
unreliable. 

Recently, the use of high frequency data has allowed for an accurate es- 
timation of the very far tails of the distributions of returns. Indeed, using 
samples of one to 10 million points enables one to efficiently calibrate prob- 
ability distributions up to probability levels of order 99.9995%. Then, one 
can hope to reconstruct the distribution of returns at a larger time scale by 
convolution. It is the stance taken by many researchers advocating the use of 
Lévy processes to model the dynamics of asset prices [109, 196, and references 
therein]. The recent study by Eberlein and Ozkan [141] shows the relevance of 
this approach, at least for fluctuations of moderate sizes. However, for large 
fluctuations, this approach is not really accurate, as shown in Fig. 2.2, which 
compares the probability density function (pdf) of raw 60-minute returns of 
the Standard & Poor’s 500 index with the hypothetical pdf obtained by 60 
convolution iterates of the pdf of the 1-minute returns; it is clear that the 
former exhibits significantly fatter tails than the latter. 

This phenomenon derives naturally from the fact that asset returns can- 
not be merely described by independent random variables, as assumed when 
prices are modeled by Lévy processes. In fact, independence is too strong an 
assumption. For instance, the no free-lunch condition only implies the absence 
of linear time dependence since the best linear predictor of future (discounted) 
prices is then simply the current price. Volatility clustering, also called ARCH 
effect [150], is a clear manifestation of the existence of nonlinear dependences 


' Bank for International Settlements. The BIS is an international organization 
which fosters cooperation among central banks and other agencies in pursuit 
of monetary and financial stability. Its banking services are provided exclusively 
to central banks and international organizations. 
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Fig. 2.2. Kernel density estimates of the raw 60-minute returns and the density 
obtained by 60 convolution iterates of the raw 1-minute returns kernel density for 
the Standard & Poor’s 500 


between returns observed at different lags. These dependences prevent the use 
of convolution for estimating tail risks with sufficient accuracy. Figure 2.2 il- 
lustrates the important observation that fat tails of asset return distributions 
owe their origin, at least in part, to the existence of volatility correlations. In 
the example of Fig. 2.2, a given 60-minute return is the sum of sixty 1-minute 
returns. If there was no dependence between these sixty 1-minute returns, 
the 60-minute return could be seen as the sum of 60 independent random 
variables; hence, its probability density could be calculated exactly by taking 
60 convolutions of the probability density of the 1-minute returns. Note that 
this 60-fold convolution is equivalent to estimating the density of 60-minute 
returns in which their sixty 1-minute returns have been reshuffled randomly 
to remove any possible correlation. Figure 2.2 shows a faster decay of the pdf 
of these reshuffled 60-minute returns compared with the pdf of the true em- 
pirical 60-minute returns. Thus, assessing extreme risks at large time scales (1 
or 10 days) by simple convolution of the distribution of returns at time scales 
of 1 or of 5 minutes leads to crude approximations and to dramatic underes- 
timations of the amount of risk really incurred. The role of the dependence 
between successive returns is even more important in times of crashes: very 
large drawdowns (amplitudes of runs of losses) have been shown to result from 
anomalous transient dependences between a few successive days [249, 250]; as 
a consequence, they cannot be explained or modeled by the distribution cal- 
ibrated for the bulk (99%) of the rest of the sample of drawdowns. These 
extreme events have been termed “outliers”, “kings” [286] or “black swans” 
[470]. 
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The only way to reliably aggregate high-frequency data is to have a con- 
sistent model at one’s disposal. By consistent model is meant a model that 
accounts for the complex time structure of asset returns. (FI)-GARCH [31], 
a-ARCH [132], multifractal? models [39, 341] or any other stochastic volatil- 
ity model [232, 473] can be used for this purpose, but none of them is yet 
universally recognized since they do not rely on well-established founding of 
economic principles. As a consequence, one is exposed to model error: for 
instance, a simple GARCH model still underestimates the tail risks since it 
underestimates the long-range dependence of the volatility. 

In this context, the most pragmatic approach may be to let the data speak 
by themselves, which is the stance taken in this chapter. For each different 
horizon, we discuss the possible parametric distributions that fit the data 
best. As we shall see, three main scales should be distinguished: small scale (a 
few minutes), intermediate scale (about an hour) and a large scale (1 day or 
more). At the smallest time scale, we will see that the tails of distributions are 
probably decaying more slowly than any power law. At the medium scale, reg- 
ularly varying distributions provide a reasonable model, while at time scales 
of 1 day or more, rapidly varying distributions — like Weibull distributions — 
seem to accurately describe the tails of the distributions of asset returns, at 
least in the range of quantiles useful for risk management. 


2.2 A Brief History of Return Distributions 


The distribution of returns is one of the most basic characteristics of the 
markets and many academic studies have been devoted to it. Contrarily to 
the average or expected return, for which economic theory provides guidelines 
to assess them in relation to risk premium, firm size, or book-to-market (see 
Chap. 1 and [161] for instance), the functional form of the distribution of 
returns, and especially of extreme returns, is much less constrained and still 
a topic of active debate. 


2.2.1 The Gaussian Paradigm 


Generally, the central limit theorem would lead to a Gaussian distribution 
for sufficiently large time intervals over which the return is estimated. Tak- 
ing the continuous time limit, such that any finite time interval is seen as 
the sum of an infinite number of increments thus leads to the paradigm of 
log-normal distributions of prices and equivalently of Gaussian distributions 
of returns. Based on the pioneering work of Bachelier [26] and later improved 


? While fractal objects, processes, or measures enjoy a global scale invariance prop- 
erty — t.e., look similar at any (time) scale — multifractals only enjoy this property 
locally, i.e., they can be conceived as a fractal superposition of infinitely many 
local fractals. 
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Table 2.1. Descriptive statistics for the daily Dow Jones Industrial Average index 
returns (from 27, May 1896 to 31, May 2000, sample size n = 28415) calculated 
over 1 day and 1 month and for Nasdaq Composite Index returns calculated over 5 
minutes and 1 hour (from 8, April 1997 to 29, May 1998, sample size n = 22123) 


Mean St. dev. Skewness Ex. Kurtosis Jarque-Bera 
Nasdaq (5 minutes) 1.80 x 10-© 6.61 x10-4 0.0326 11.8535 ——-1.30 x 10° (0.00) 
Nasdaq (1 hour) t 2.40 x 10-5 3.30 x10-3 =—-1.3396 =. 23.7946 = 4.40 x 104 (0.00) 
Nasdaq (5 minutes) + —6.33 x 1079 3.85 x 10-4 —0.0562 6.9641 4.50 x 104 (0.00) 
Nasdaq (1 hour) + 1.05 x 10-® 1.90 x 10-3 —0.0374 4.5250 1.58 x 10° (o.00) 
Dow Jones (1 day) 8.96 x 10-5 4.70 x 107-3 —0.6101 22.5443 6.03 x 10° (0.00) 
Dow Jones (1 month) 1.80 x 1073 2.54 x 10-2 —0.6998 5.3619 1.28 x 103 (0.00) 
() Raw data, () data corrected for the U-shape of the intraday volatility due to the 


opening, lunch, and closing effects. 

The Dow Jones exhibits a significantly negative skewness, which can probably be ascribed 
to the impact of the market crashes. The raw Nasdaq returns are significantly positively 
skewed while the returns corrected for the “lunch effect” are negatively skewed, showing 
that the lunch effect plays an important role in the shaping of the distribution of the 
intraday returns. Note also the important decrease of the kurtosis after correction of the 
Nasdaq returns for the lunch effect, confirming the strong impact of the lunch effect. In 
all cases, the excess-kurtosis are high and remain significant even after a time aggregation 
of one month. The numbers within parentheses represent the p-value of Jarque-Bera’s 
normality test, a joint statistic using skewness and kurtosis coefficients [116]: the normality 
assumption is rejected for these time series. The Lagrange multiplier test proposed in [151] 
allows to test for heteroscedasticity. It leads to the T - R? test statistic, where T denotes 
the sample size and R? is the determination coefficient of the regression of the squared 
centered returns x¢ on a constant and on q of their lags x¢~-1,7t-2,...,%+t—q. Under the 
null hypothesis of homoscedastic time series, T'- R? follows a y?-statistic with q degrees 
of freedom. The test — performed up to lag q = 10 — shows that, in every case, the null 
hypothesis is strongly rejected at any usual significance level.Thus, the time series are 
heteroscedastic and exhibit volatility clustering.The BDS test [84], which allows one to 
detect not only volatility clustering, as in the previous test, but also departure from iid- 
ness due to non-linearities confirms that the null-hypothesis of iid data is strongly rejected 
at any usual significance level. Reproduced from [329]. 


by Osborne [377] and Samuelson [425], the log-normal paradigm has been 
the starting point of many financial theories such as Markowitz’s portfolio 
selection method [347], Sharpe’s market equilibrium model (CAPM) [429] or 
Black and Scholes rational option pricing theory [60]. However, for real finan- 
cial data, the convergence in distribution to a Gaussian law is very slow (see 
for instance [72, 88]), much slower than predicted for independent returns. 
As shown in Table 2.1, the excess kurtosis (which is zero for a normal dis- 
tribution) typically remains large even for monthly returns, testifying (i) of 
significant deviations from normality, (ii) of the heavy tail behavior of the 
distribution of returns and (iii) of significant time dependences between asset 
returns [88]. 


2.2 A Brief History of Return Distributions 39 
2.2.2 Mechanisms for Power Laws in Finance 


Another approach rooted in economic theory, which can be invoked to derive 
the distribution of financial returns, consists in applying the “Gibrat princi- 
ple” [441] initially introduced to account for the growth of cities and of wealth 
through a mechanism combining stochastic multiplicative and additive noises 
(55, 207, 268, 446, 454] leading to a Pareto distribution of sizes [94, 193]. Ra- 
tional bubble models 4 Ja Blanchard [61] can also be cast in this mathematical 
framework of stochastic recurrence equations and leads to distributions with 
regularly varying tails, albeit with a strong constraint on the tail exponent 
(see [323] for the monovariate case and [331] for the multivariate case). These 
works suggest that an alternative and natural way to capture the heavy tail 
character of the distributions of returns is to use distributions with power-like 
tails (Pareto, generalized Pareto, Lévy stable laws) or more generally, reg- 
ularly varying distributions|57],° the later ones encompassing all the former 
ones. At first glance, Fig. 2.3, which depicts the complementary sample dis- 
tribution function for the 30-minute returns of the Standard & Poor’s 500, 
seems to substantiate this thesis. 

Other mechanisms involving the existence of a long memory of the volatil- 
ity have been recently found to describe many of the stylized facts of mono- 
variate financial returns. In particular, the multifractal random walk (MRW) 
is a process constructed with a very long memory in the volatility, such that 
it has a bona fide continuous limit with exact multifractal properties on the 
absolute values of the returns [27]. Appendix 2.A defines random cascade 
models from which the MRW derives and summarizes their main proper- 
ties. For random cascade models exhibiting multifractality, it has been shown 
exactly that the random variables, defined in Sect. 2.4.1 as the increments 
darX (t) = X(t) — X(t — At) corresponding to the log-returns calculated over 
the horizon At, are distributed in the tail according to a power law 


Pr[6a:X > a] =L(x)-a-°, (2.1) 
where the exponent b is given by 


b=sup{q, q>1, ¢(@) > 1}, (2.2) 


and ¢(q), defined in (2.4.7), is the spectrum of exponents of the moments of 
the absolute value of the log-returns (see Appendix 2.A). For the MRW, we 
have ¢(¢q) = (¢—4q(q—2)A?)/2 according to (2.A.36), where A? is the so-called 
multifractal parameter. Condition (2.2) then yields b = 1/X? to leading order. 
Another equivalent way to arrive at the same result is to use the moments 
defined by (2.4.34) with (2.4.35), which can be shown to diverge ((, = +00) 


3 The general representation of a regularly varying distribution F is given by 1 — 
F(x) = L(x)-a~°, where L(-) is a slowly varying function, that is, a function 
such that lim,;_,.. £(tx)/£(x) = 1 for any finite t. The parameter a is usually 
called the tail index or the tail exponent. 
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Fig. 2.3. Complementary sample distribution function for the Standard & Poor’s 
500 30-minute returns over the two decades 1980-1999. The plain (resp. dotted) 
line depicts the complementary distribution for the positive (the absolute value of 
negative) returns. Reproduced from [330] 


if ¢(q) < 1 [27]. The calibration‘ of \? gives in general very small values in the 
range 0.01—0.04 leading to a tail index b in the range 15-50 [366]. This has led 
previous workers to conclude that such a large tail exponent is unobservable 
with available data sets, and may well be described by other effective laws. 

However, Muzy et al. [365] have recently shown that empirical distributions 
of log-returns do not give access to the unconditional prediction b » 15-50 
with (2.2). This is because the value of g determining the exponent b according 
to (2.2) is itself associated with an a (through the Legendre transformation 
(2.4.20) and (2.A.21)) for which the multifractal spectrum f(a) defined in 
Sect. 2.A.2 is negative. But negative f(a)’s are unobservable.° Indeed, from 
the definition (2.4.19) of f(a), only positive f(a)’s correspond to genuine 
fractal dimensions and are thus observable: this is because they correspond to 
more than a few points of observations in the limit At < T. The key remark of 
Muzy et al. [365] is therefore that the observable exponent bops for an infinite 
time series will be the largest positive q such that f(a) > 0: 


bobs = sup{g, g>1, f(a) > 0}. (2.3) 


4 From the correlation function of the log-volatility, from the scaling approach using 
the multifractal spectrum or from the generalized method of moment [321]. 

° Mandelbrot has shown that they can be interpreted in terms of singularities of 
large deviations. 
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Using the form (2.4.36) together with (2.4.20), we obtain bobs = V2/X. For 
a financial time series of finite length L, Muzy et al. [365] have shown in 
addition that the observed exponent is further reduced as a function of the 
ratio n N;/1In Nr of the logarithm of the number N; = L/T of integral scales 
over the logarithm of the number Nr = T/At of data points per integral scale. 
This makes a huge difference: rather than tail indices in the range 15-50, this 
gives observable tail indices in the range 3-5, as observed empirically. 

See also [28] for extensions of the MRW to log infinitely divisible processes 
and [39, 341] for other applications of the multifractal process to the modeling 
of asset returns dynamics. 

A different point of view on the underlying mechanism for the power law 
tails of price fluctuations has been proposed by Gabaix et al. [194, 195]. In 
essence, their proposal is that price variations are driven by fluctuations in 
the volume of transactions, V, whose cumulative distribution function Fy has 
a regularly-varying tail with a universal exponent y ~ 1.5. The fluctuations 
in the volume of transactions are argued in addition to be modulated by a 
deterministic market impact function, which describes the response of prices 
to transactions of the form r = kV, where r is the change in the logarithm 
of price resulting from a transaction of volume V, k is a constant and 6 = 0.5. 
This relationship can be derived from the assumption that agents are profit 
maximizers. These two ingredients Fy(V) ~ 1—1/V7 and r = kV® imply 
that large price returns r have also a power law distribution with exponent 
p= 7/8 & 3. Gabaix et al. find that their theory is consistent with the 
data. It is important to stress that these results are obtained by using ag- 
gregated data over a fixed time interval. Farmer and Lillo [165] argue that 
aggregating the data in time complicates the discussion, since the functional 
form of the market impact generally depends on the length of the time in- 
terval. They find that the same analysis based on individual (rather than 
time-aggregated) transactions does not confirm Gabaix et al.’s results and 
they suggest that the tail of price changes is driven by fluctuations in liq- 
uidity rather than in the volume of transactions. However, Plerou et al. [387] 
make the important point that individual transactions do not reflect true or- 
ders, especially the large ones, since the large orders of a large fund, say, are 
generally split in several transactions. Thus, the correct observable seems in- 
deed to be the time-aggregated volume (albeit with variations in timespan), 
rather than individual trades. It is probable, however, that both mechanisms of 
fluctuations in the volume of transactions and in liquidity play a role in deter- 
mining the statistics of price changes. In addition, the mechanism proposed by 
Gabaix et al. transfers the question of the origin of the power law distribution 
of the returns to the open question of the origin of the power law distribution 
of the volumes of transactions, which could reflect the power law distribu- 
tion of the fund sizes, since the larger a fund is, the larger its orders can 
be expected to be. However, the distribution of the top 10% mutual funds 
[194] and of firm sizes [25, 376] are found to be regularly varying with a tail 
index close to 1, significantly smaller than the value 1.5 of the exponent of 
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the distribution of the volumes of transactions. Unraveling the origin of this 
exponent 1.5 thus requires an understanding of the strategies of investors and 
how they organize, fragment, and delay their orders. 


2.2.3 Empirical Search for Power Law Tails 
and Possible Alternatives 


In the early 1960s, Mandelbrot [339] and Fama [157] presented evidence that 
distributions of returns can be well approximated by a symmetric Lévy stable 
law with tail index b about 1.7. These estimates of the tail index have recently 
been supported by Mittnik et al. [362], and slightly different indices of the 
stable law (b = 1.4) were suggested by Mantegna and Stanley [345, 346]. 

On the other hand, there are numerous evidences of a larger value of the 
tail index b & 3 [217, 312, 320, 322, 367]. See also the various alternative 
parameterizations in terms of the Student distribution [62, 275], or Pearson 
Type-VII distributions [368], which all have an asymptotic power law tail and 
are regularly varying. Thus, a general conclusion of this group of authors con- 
cerning tail fatness can be formulated as follows: the tails of the distribution 
of returns are heavier than a Gaussian tail and heavier than an exponential 
tail; they certainly admit the existence of a finite variance (b > 2), whereas 
the existence of the third (skewness) and the fourth (kurtosis) moments is 
questionable. 

These two classes of results are contradictory only on the surface, because 
they actually do not apply to the same quantiles of the distributions of re- 
turns. Indeed, Mantegna and Stanley [345] have shown that the distribution 
of returns of the Standard & Poor’s 500 index can be described accurately 
by a Lévy stable law only within a limited range up to about 5 standard 
deviations, while a faster decay (approximated by an exponential or a power 
law with larger exponent) of the distribution is observed beyond. This almost- 
but-not-quite Lévy stable description could explain (at least, in part) the slow 
convergence of the distribution of returns to the Gaussian law under time ag- 
gregation [72, 451]; and it is precisely outside this range of up to 5 standard 
deviations, where the Lévy law does not apply anymore that a tail index b = 3 
has been estimated. Indeed, most authors who have reported a tail index b = 3 
have used some optimality criteria for choosing the sample fractions (i.e., the 
largest values) for the estimation of the tail index. Thus, unlike the authors 
supporting stable laws, they have used only a fraction of the largest (positive 
tail) and smallest (negative tail) sample values. 

It would thus seem that all has been said on the distributions of returns. 
However, there are still dissenting views in the literature. Indeed, the class 
of regularly varying distributions is not the sole one able to account for the 
large kurtosis and fat-tailness of the distributions of returns. Some recent 
works suggest alternative descriptions for the distributions of returns. For in- 
stance, Gouriéroux and Jasiak [208] claim that the distribution of returns on 
the French stock market decays faster than any power law. Cont et al. [108] 


2.3 Constraints from Extreme Value Theory 43 


have proposed to use exponentially truncated stable distributions, Barndorff- 
Nielsen [37], Eberlein et al. [140] and Prause [393] have respectively considered 
normal inverse Gaussian and (generalized) hyperbolic distributions, which as- 
ymptotically decay as x* - exp(—(x), while Laherrére and Sornette [286] sug- 
gest to fit the distributions of stock returns by the Stretched-Exponential 
law.° Of the same type are the marginal distributions of the so-called CGMY 
model proposed by Carr et al. [90]. These results, challenging the traditional 
hypothesis of a power-like tail, offer a new representation of the returns dis- 
tributions. 

In addition, real financial time series exhibit (G) ARCH effects [65, 66] lead- 
ing to heteroscedasticity and to clustering of high threshold exceedances due 
to a long memory of the volatility. These rather complex dependent structures 
make difficult, if not questionable, the blind application of standard statistical 
tools for data analysis. In particular, the existence of significant dependence in 
the return volatility leads to the existence of a significant bias and an increase 
of the true standard deviation of the statistical estimators of tail indices. 
Indeed, there are now many examples showing that dependences and long 
memories as well as non-linearities mislead standard statistical tests (see for 
instance [12, 216]). Consider the Hill’s and Pickands’ estimators, which play an 
important role in the study of the tails of distributions. It is often overlooked 
that, for dependent time series, Hill’s estimator remains only consistent but 
not asymptotically efficient [416]. Moreover, for financial time series with a de- 
pendence structure described by an IGARCH process, it has been shown that 
the standard deviation of Hill’s estimator obtained by a bootstrap method 
can be seven to eight times larger than the standard deviation derived under 
the asymptotic normality assumption [267]. These figures are even worse for 
Pickands’ estimator. 


2.3 Constraints from Extreme Value Theory 


The application of extreme value theory (EVT) to the investigation of the 
properties of the distributions of asset returns has grown rapidly during the 
last decade. Longin [312] was one of the main promoters of this method and 
has advocated its use for risk management purposes [313], particularly for 
Value-at-Risk assessment and stress testing. The conclusions drawn from the 
various studies of empirical distributions of log-returns, based on the extreme 
value theory, show that they should belong to the maximum domain of at- 
traction of the Fréchet distribution, so that they are necessarily regularly 
varying laws. However, most of these studies have been performed under the 


® Picoli et al. [385] have also presented fits comparing the relative merits of 
Stretched-Exponential and so-called g-exponentials (which are similar to Student 
distribution with power law tails) for the description of the frequency distribu- 
tions of basketball baskets, cyclone victims, brand-name drugs by retail sales, and 
highway lengths. 
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restrictive assumption that (i) financial time series are made of independent 
and identically distributed returns, and (ii) the corresponding distributions 
of returns belong to one of only three possible maximum domains of attrac- 
tion.” However, these assumptions are not fulfilled in general. While Smith’s 
results [444] indicate that the dependence of the data does not constitute a 
major problem in the limit of large samples, so that volatility clustering of 
financial data does not prevent the reliability of EVT, we shall see that it 
can significantly bias standard statistical tools for samples of size commonly 
used in extreme tails studies. Moreover, the conclusions of many studies are 
essentially based on an aggregation procedure which stresses the central part 
of the distribution while smoothing and possibly distorting the characteristics 
of the tail (whose properties are obviously essential in characterizing the tail 
behavior). 

The question then arises whether the limitations of these statistical tools 
could have led to erroneous conclusions about the tail behavior of the distrib- 
utions of returns. In this section, presenting tests performed on synthetic time 
series with time dependence in the volatility with both Pareto and Stretched 
Exponential (SE) distributions, and on two empirical time series (the daily 
returns of the Dow Jones Industrial Average Index over a century (n = 28415 
data points) and the 5-minute returns of the Nasdaq Composite index over 1 
year from April 1997 to May 1998 (n = 22123 data points)), we exemplify the 
fact that the standard generalized extreme value (GEV) estimators can be 
quite inefficient due to the possibly slow convergence toward the asymptotic 
theoretical distribution and the existence of biases in the presence of depen- 
dence between data. Thus, one cannot reliably distinguish between rapidly 
and regularly varying classes of distributions. The generalized Pareto distri- 
bution (GPD) estimators work better, but still lack power in the presence of 
strong dependence. Note that the two empirical data sets used in the illus- 
tration below are justified by their similarity with (i) the data set of daily 
returns used in [312] particularly, and (ii) the high frequency data used in 
(217, 322, 367] among others. 


” Extensions of the asymptotic theory of extreme values to correlated sequences 
have been developed by Berman [48, 49] for Gaussian sequences and Loynes [318], 
O’Brien [374], Leadbetter [293] and others [369] in the more general context of 
stationary sequences satisfying mixing conditions. See also Kotz and Nadarajah 
[277] for the limit distribution of extreme values of 2D correlated random vari- 
ables. Recently, there is a growing interest in the extreme value theory of strongly 
correlated random variables in many areas of science, including applications to 
diffusing particles in correlated random potentials [89], to the understanding of 
large deviations in spin glass ground state energies [13], to front propagation and 
fluctuations [378], fragmentation, binary search tree problem in computer science 
[325, 326], to maximal height of growing surfaces [399], to the Hopfield model of 
brain learning [75], and so on. 
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2.3.1 Main Theoretical Results on Extreme Value Theory 


Two limit theorems allow one to study the extremal properties and to deter- 
mine the maximum domain of attraction (MDA) of a distribution function in 
two forms. 

First, consider a sample of N iid realizations X,, X2,...,Xy of a random 
variable. Let X{; denote the maximum of this sample.® Then, the Gnedenko 
theorem states that, if, after an adequate centering and normalization, the 
distribution of X;; converges to a non-degenerate distribution as N goes to 
infinity, this limit distribution is then necessarily the generalized extreme value 
(GEV) distribution defined by 


He(x) = exp |-(1+€-2)-"§] , (2.4) 


with x € [-1/€, 00) if € > 0 and a € (—oo, —1/€] if € < 0. When € = 0, He(z) 
should be understood as 


Ho(«) = exp[—exp(—2)], «ER. (2.5) 


Thus, for N large enough 


Pr [XA <a] ~ Hey (+) (2.6) 


for some value of the centering parameter ty, scale factor wy and form para- 
meter €y. The form parameter € is of paramount importance for the shape of 
the limiting distribution. Its sign determines the three possible limiting forms 
of the GEV distribution of maxima (2.4): 


1. If € > 0 the limit distribution is the (shifted) Fréchet power-like distribu- 
tion; 

2. If € = 0, the limit distribution is the Gumbel (double-exponential) distri- 
bution; 

3. If € < 0, the limit distribution has a support bounded from above. 


The determination of the parameter € is the central problem of extreme 
value analysis. Indeed, it allows one to determine the maximum domain of 
attraction of the underlying distribution and therefore its behavior in the tails. 
When € > 0, the underlying distribution belongs to the Fréchet maximum 
domain of attraction and is regularly varying (power-like tail). When € = 0, it 
belongs to the Gumbel maximum domain of attraction and is rapidly varying 
(exponential tail), while if € < 0 it belongs to the Weibull maximum domain 
of attraction and has a finite right endpoint, which means that there exists a 
finite xp such that X < xy» with probability one. 


8 Similar results hold for XX = min{X1,...,Xw} since min{X1,...,Xv} = 
—max{—X1,...,—-Xw}. 
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The usefulness of formula (2.6) for risk assessment purposes seems obvious 
as it provides a universal estimation of the Value-at-Risk. If X denotes the 
profit and loss, Xs; represents the largest among N losses. The Value-at-Risk 
at confidence level a, denoted by VaRq, is given by the unique solution of: 


Fx (VaRq) = Pr[X < VaRal =a, (2.7) 


provided that Fy is increasing.? For N iid observations of the profits and 
losses, we have 


Pr [X*® < VaRa] = Pr[X < VaR.|* =a" , (2.8) 


so that VaR, is (asymptotically) solution of 


Hey (“Ss — en) =a, (2.9) 
wn 
which with (2.4) yields 
Vv Yn ~Ey 
aRg ~ un + ap (—N Ina) —1}. (2.10) 
N 


When the observations are not zid, one can generally replace N by 0-N, where 
6 € [0,1] is the so-called extremal index [146, 293, 313], related to the size of 
the clusters of extremes which may appear when the data exhibit temporal 
dependence. Indeed, generally speaking, one can write [146, p.419]: 


Pr [X® < VaRg] © Pr[X < VaR,|°*% =a" , (2.11) 
so that 
Vv i -€ 
aRe Sat (-6.NiIna) > -1]. (2.12) 


The second limit theorem is called after Gnedenko-Pickands-Balkema-de 
Haan (GPBH) and its formulation is as follows [146, pp. 152-168] (see also 
[451, Chap. 1] for an intuitive exposition). In order to state the GPBH the- 
orem, let us define the right endpoint xp of a distribution function F(x) as 
Up =sup{x: F(x) < 1}. Let us call the function 


Pr{X -u>a¢|X >us=F,(z) (2.13) 


the excess distribution function. Then, this (survival) distribution function 
F(z) belongs to the maximum domain of attraction of H¢(x) defined by 
(2.4) if and only if there exists a positive scale-function s(u), depending on 
the threshold u, such that 


° For a more general definition of VaR, see (3.85) page 125. 
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lim sup |Fy(2) E(w | &6(u))| = 0, (2.14) 
ULE Q<ae<aup—u 
where 
x w\ —1/E 
G(x | £,s)=1+nH; (=) =1 (1 te-=) (2.15) 


is called the generalized Pareto distribution (GPD). By taking the limit € — 0, 
expression (2.15) leads to the exponential distribution. The support of the 
distribution function (2.15) is defined as follows: 


—s/€, if€<0. a) 


0<xr<w, if€ >0 
O0O<aK< 
Thus, the GPD has a finite support for € < 0. 
Again, this theorem has important practical implications for risk man- 
agement, since it provides a general assessment of the expected-shortfall of 
a position X associated with a given distribution of profits and losses. The 
expected-shortfall, at confidence level a, is given by: 


ES, =E[X|X > VaR,] , (2.17) 


which can be evaluated with the help of relation (2.15): 

ES, = VaRq + =a ; (2.18) 
with € < 1, in order for the expectation to exist.!° 

As a note of caution, it should be underlined that the existence of a non- 
degenerate limit distribution of properly centered and normalized maxima 
Xf or peaks over threshold X — u|X > uw is a rather strong requirement. 
There are a lot of distribution functions which do not satisfy this condition, 
e.g., infinitely alternating functions between a power-like and an exponential 
behavior. 


2.3.2 Estimation of the Form Parameter and Slow Convergence 
to Limit Generalized Extreme Value (GEV) 
and Generalized Pareto (GPD) Distributions 


There exist two main ways of estimating the form parameter €. First, if there is 
a sample of maxima (taken from subsamples of sufficiently large size), then one 
can fit to this sample the GEV distribution, thus estimating the parameters 
by the maximum likelihood method, for instance. Alternatively, one can prefer 
the distribution of exceedances over a large threshold given by the GPD (2.15), 
whose tail index can be estimated with Pickands’ estimator or by maximum 


10 Recall that € is the inverse of the tail exponent. 
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likelihood, as previously. Hill’s estimator cannot be used in the present case 
since it assumes € > 0, while the essence of extreme value analysis is, as we 
said, to test for all the classes of limit distributions without excluding any 
possibility, and not only to determine the quantitative value of an exponent. 
Each of these methods has its advantages and drawbacks, especially when one 
has to study dependent data, as we show below. 

Given a sample of size N, one can consider the g-maxima drawn from 
q subsamples of size p (such that p- q = N) to estimate the parameters 
(u,v, €) in (2.6) by maximum likelihood. This procedure yields consistent and 
asymptotically efficient Gaussian estimators, provided that € > —1/2 [444]. 
The properties of the estimators still hold approximately for dependent data, 
provided that the interdependence remains weak. However, it is difficult to 
choose the optimal value q of the number of subsamples as it depends both on 
the size N of the entire sample and on the underlying distribution: the maxima 
drawn from an exponential distribution are known to converge very quickly 
to Gumbel’s distribution [220], while for the Gaussian law, convergence is 
particularly slow [219]. 

The second possibility is to estimate the parameter € from the distrib- 
ution of exceedances (i.e., from the GPD). For this, one can use either the 
maximum likelihood estimator or Pickands’ estimator. Maximum Likelihood 
estimators are well known to be asymptotically the most efficient ones (at least 
for € > —1/2 and for independent data) but, in this particular case, Pickands’ 
estimator works reasonably well. Given an ordered sample x1 < rg < ++: < ay 
of size N, Pickands’ estimator is given by 

1 Le — LI 


En = ——In 


2.19 
In2 Lok — LVAk ( ) 


For independent and identically distributed data, this estimator is consistent 
provided that k is chosen so that k —> oo and k/N —> 0 as N —> oo. 
Moreover, €%,n is asymptotically normal with variance 


g(a +1) 
(2(2 — 1) In2)?’ 


o(&,n)?-k as N —> oo. (2.20) 
In the presence of dependence between data, one can expect an increase of 
the standard deviation, as reported by Kearns and Pagan [267]. For time 
dependence of the GARCH class, they have indeed demonstrated a signifi- 
cant increase of the standard deviation of the tail index estimator, such as 
Hill’s estimator, by a factor more than seven with respect to their asymptotic 
properties for iid samples. This leads to very inaccurate index estimates for 
time series with this kind of temporal dependence. Another problem lies in 
the determination of the optimal threshold u of the GPD, which is in fact 
related to the optimal determination of the subsamples size p in the case of 
the estimation of the parameters of the distribution of maximum. 

In order to compare the performance of the various estimators of the tail 
index € for iid data, Malevergne et al. [329] have considered several numerically 
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generated samples respectively drawn from (i) an asymptotic power law dis- 
tribution with tail index b = 3, (ii) a SE distribution, i.e., such that 


InPr[X <a] x -a2°, ast— Ow, (2.21) 


with fractional exponent c = 0.7 and (iii) a SE with fractional exponent 
c = 0.3. Considering 1000 replications of each of these three samples (made of 
10,000 data each), they show that the estimates of € obtained from the distri- 
bution of maxima (2.6) are compatible (at the 95% confidence level) with the 
theoretical value for the first two distributions (Pareto and SE with c = 0.7) 
as soon as the size p of the subsamples, from which the maxima are drawn, 
is larger than 10. For the SE with fractional exponent c = 0.3, an average 
value € larger than 0.2 is obtained even for large subsample sizes (p = 200). 
This value is reported to be significantly different from the theoretical value 
€ = 0.0. These results clearly show that the distribution of the maximum 
drawn from a SE distribution with c = 0.7 converges quickly toward the the- 
oretical asymptotic GEV distribution, while for c = 0.3 the convergence is 
very slow. A fast convergence for c = 0.7 is not surprising since, for this value 
of the fractional index c, the SE distribution remains close to the exponential 
distribution, which is known to converge very quickly to the GEV distribu- 
tion [220]. For c = 0.3, the SE distribution behaves, over a wide range, like 
the power law (see page 59 hereafter for a theoretical formalization with an 
exact embedding of the power law into the SE family). Thus, it is not sur- 
prising to obtain an estimate of € which remains significantly positive for SE 
distributions with small exponents c’s. 

Overall, the results reported in [829] are slightly better for the maximum 
likelihood estimates obtained from the GPD. Indeed, the bias observed for 
the SE with c = 0.3 seems smaller for large quantiles than the smallest biases 
reached by the GEV method. Thus, it appears that the distribution of ex- 
ceedance converges faster to its asymptotic distribution than the distribution 
of maximum. However, while in line with the theoretical values, the standard 
deviations are found to be almost always larger than in the previous case, 
which testifies of the higher variability of this estimator. Thus, for sample of 
sizes of 10,000 or so — a typical size for most financial samples — the GEV and 
GPD maximum likelihood estimates should be handled with care and their 
results interpreted with caution due to possibly important bias and statistical 
fluctuations. If a small value of € seems to allow one to reliably conclude in 
favor of a rapidly varying distribution, a positive estimate does not appear 
informative, and in particular does not allow one to reject the rapidly varying 
behavior of a distribution. Pickands’ estimator does not perform better, in so 
far as it is also unable to distinguish between a regularly varying distribution 
and a SE with a low fractional exponent [329]. 

As another example illustrating the very slow convergence to the limit 
distributions of the extreme value theory mentioned above, even with very 
large samples, let us consider a simulated sample of iid random variables (we 
thus fulfill the most basic assumption of extreme values theory, i.e, iid-ness) 
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with Weibull distribution defined by 
xz Cc 
F,(x) =1—exp |- (5) (2.22) 


with parameter set (c > 0,d > 0), for « > 0. This distribution belongs 
to the class of Stretched-Exponential distributions when the exponent c is 
smaller than one, namely when the distribution decays more slowly than an 
exponential distribution (but still faster than any power law). We consider 
two values for the exponent of the Weibull distribution: c = 0.7 and c = 0.3, 
with d = 1. Theoretically, using for instance the GPD of exceedances should 
give estimated values of € close to zero in the limit of large N, since the SE 
distribution belongs to the basin of attraction of the Gumbel distribution. In 
order to use the GPD, we construct the conditional Weibull distribution under 
the condition X > U,,k = 1,...,15, where the thresholds U; are chosen as: 
U, = 0.1; Uz = 0.38; U3 = 1; Us = 3; Us = 10; Ug = 30; U7 = 100; Ug = 
300; Ug = 1000; U0 _ 3000; Uy, _ 104; Uj. =3- 10°; Ui13 _ 10°; Ur4 = 
3° 10° and U15 => 10.6 

For each simulation, the size of the sample above a given threshold U;, is 
set equal to 50,000 in order to get small standard deviations. The maximum- 
likelihood estimates of the GPD form parameter € are shown in Fig. 2.4 as a 
function of the index k of U,. For c = 0.7, the threshold U7 gives an estimate 
€ = 0.0123 with standard deviation equal to 0.0045, z.e., the estimate for € 
differs significantly from zero (recall that € = 0 is the theoretical limit value). 
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Fig. 2.4. Maximum likelihood estimates of the GPD form parameter € in (2.15) 
as a function of the index k of the thresholds U; defined in the text for stretched- 
exponential samples of size 50,000 and their 95% confidence interval. Reproduced 
from [329] 
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Stronger deviations from the correct value € = 0 are found for the smaller 
thresholds Uj,...,Ug while the discrepancy abates for larger thresholds U;’s 
for k > 7. These results occur notwithstanding the huge size of the implied 
data set; indeed, the probability Pr(X > U7) for c = 0.7 is about 107°, so 
that in order to obtain a data set of conditional samples from an unconditional 
data set of the size studied here (50,000 realizations above U7), the size of such 
an unconditional sample should be approximately 10° times larger than the 
number of “peaks over threshold.” It is practically impossible to have such 
a sample. For c = 0.3, the convergence to the theoretical value zero is much 
slower and the discrepancy with the correct value € = 0 remains even for 
the largest financial data sets: for a single asset, the largest data sets, drawn 
from high frequency data, are no larger than or of the order of one million 
points;'! the situation does not improve for data sets one or two orders of 
magnitudes larger as considered in [211], obtained by aggregating thousands 
of stocks.!? Thus, although the GPD form parameter should be theoretically 
zero in the limit of a large sample for the Weibull distribution, this limit cannot 
be reached for any available sample sizes. This is another clear illustration that 
a rapidly varying distribution, like the SE distribution, can be mistaken for a 
regularly varying distribution for any practical applications. 


2.3.3 Can Long Memory Processes Lead to Misleading Measures 
of Extreme Properties? 


As we already mentioned, Kearns and Pagan [267] have reported how mislead- 
ing could be Hill’s and Pickands’ estimators in the presence of dependence in 
data. Focusing on IGARCH processes, they show that the estimated standard 
deviations of these estimators increase significantly with respect to the theo- 
retical standard deviations derived under the iid assumption. They also find 
an important bias. Generalizing these results, the study by Malevergne et al. 
[329] shows that the presence of simple Markovian time dependences is suffi- 
cient to draw erroneous conclusions from GEV or GPD maximum likelihood 
estimates and Pickands estimates as well. Considering Markovian processes 
with different stationary distributions including a regularly varying distribu- 
tion with the tail index 6 = 3 and two SEs with fractional exponents c = 0.3 
and c = 0.7, they report the presence of a significant downward bias (with 
respect to the iid case) in almost every situation for the GPD estimates: the 
stronger the dependence (measured by the correlation time varying from 20 
to 100), the more important is the bias. At the same time, the empirical val- 
ues of the standard deviations remain comparable with those obtained for iid 


" One year of data sampled at the 1-minute time scale gives approximately 1.2-10° 
data points. 

2 Tn this case, another issue arises concerning the fact that the aggregation of 
returns from different assets may distort the information and the very structure 
of the tails of the probability density functions (pdf), if they exhibit some intrinsic 
variability [351]. 
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data. The downward bias can be ascribed to the dependence between data. 
Indeed, positive dependence yields important clustering of extremes and ac- 
cumulation of realizations around some values, which — for small samples — 
could (misleadingly) appear as the consequence of the compactness of the 
support of the underlying distribution. In other words, for finite samples, the 
dependence prevents the full exploration of the tails and creates clusters that 
mimic a thinner tail (even if the clusters are all occurring at large values since 
the range of exploration of the tail controls the value of €). 

The situation is different for the GEV estimates which exhibit biases which 
can be either upward or downward (with respect to the iid case). For the GEV 
estimates, two effects are competing. On the one hand, the dependence cre- 
ates a downward bias, as explained above, while, on the other hand, the lack 
of convergence of the distribution of maxima toward its GEV asymptotic dis- 
tribution results in an upward bias, as observed on iid data (see the previous 
section). This last phenomenon is strengthened by the existence of time de- 
pendence which leads to decrease the “effective” sample size (the actual size 
divided by the correlation time of the time series) and thus slows down the 
convergence rate toward the asymptotic distribution even more. Interestingly, 
both the GEV and GPD estimators for the Pareto distribution may be utterly 
wrong in presence of long-range dependence for any cluster sizes. 

The same kind of results are reported for Pickands’ estimator. However, 
the estimated standard deviations reported in [329] remain of the same order 
as the theoretical ones, contrarily to results reported by [267] for IGARCH 
processes. Nonetheless, in both studies, a very significant bias, either positive 
or negative, is found, which can lead to misclassify a SE distribution for a 
regularly varying distribution. Thus, in presence of dependence, Pickands’ 
estimator becomes unreliable. 

To summarize, the determination of the maximum domain of attraction 
with usual estimators does not appear to be a very efficient way to study the 
extreme properties of financial time series. Many studies on the tail behav- 
ior of the distributions of asset returns have focused on these methods (see 
the influential study [312] for instance) and may thus have led to spurious 
conclusions. In particular, the fact that rapidly varying distribution functions 
may be mistaken for regularly varying distribution functions casts doubts on 
the strength of the seeming consensus according to which the distributions of 
returns are regularly varying. It also casts doubts on the reliability of EVT 
for risk assessment. If an accurate estimation of the shape parameter € is 
so difficult to reach, how can one hope to obtain trustful estimates of the 
Value-at-Risk or expected-shortfall by use of EVT? 


2.3.4 GEV and GPD Estimators of the Distributions of Returns 
of the Dow Jones and Nasdaq Indices 


As an illustration, let us apply the GEV and GDP estimators to the daily 
returns of the Dow Jones Industrial Average Index over the last century and 
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Fig. 2.5. Daily returns of the Dow Jones Industrial Average Index from 1900 to 
2000 (left panel) and 5-minute returns of the Nasdaq Composite index over 1 year 


from April 1997 to May 1998 (right panel) 


to the 5-minute returns of the Nasdaq Composite index over 1 year from April 
1997 to May 1998. These two time series are depicted on Fig. 2.5. 

For the intraday Nasdaq data, there are two caveats that must be ad- 
dressed before any estimation can be made. First, in order to remove the 
effect of overnight price jumps, the intraday returns have to be determined 
separately for each of 289 days contained in the Nasdaq data. Then, the union 
of all these 289 return data sets provide a better global return data set. Sec- 
ond, the volatility of intraday data is known to exhibit a U-shape, also called 
“lunch effect”, that is, an abnormally high volatility at the beginning and the 
end of the trading day compared with a low volatility at the approximate time 
of lunch. Such an effect is present in this data set and it is desirable to cor- 
rect it. Such a correction has been performed by renormalizing the 5-minute 
returns at a given instant of the trading day by the corresponding average 
absolute return at the same instant (when the average is performed over the 
289 days). We shall refer to this time series as the corrected Nasdaq returns in 
contrast with the raw (incorrect) Nasdaq returns and we shall examine both 
data sets for comparison. 

The daily returns of the Dow Jones also exhibit some non-stationarity. 
Indeed, one can observe a clear excess volatility roughly covering the time of 
the bubble ending in the October 1929 crash followed by the Great Depres- 
sion. To investigate the influence of this non-stationarity, the statistical study 
presented below has been performed twice: first with the entire sample, and 
then after having removed the period from 1927 to 1936 from the sample. The 
results are somewhat different, but on the whole, the conclusions about the 
nature of the tail are the same. 

Although the distributions of positive and negative returns are known to be 
very similar (see for instance [256]), we have chosen to treat them separately. 
For the Dow Jones, this gives us 14949 positive and 13464 negative data points 
while, for the Nasdaq index, we have 11241 positive and 10751 negative data 
points. 
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Given these precautionary measures, the analysis of the previous section 
has been applied to the the Dow Jones and Nasdaq (raw and corrected) re- 
turns. In order to estimate the standard deviations of Pickands’ estimator for 
the GPD derived from the upper quantiles of these distributions, and of the 
Maximum Likelihood estimators of the distribution of the maximum and of 
the GPD, we have randomly generated 1000 subsamples, each subsample be- 
ing constituted of 10,000 data points in the positive or negative parts of the 
samples respectively (with replacement). It should be noted that the Maxi- 
mum Likelihood estimates themselves were derived from the full samples. The 
results are given in Tables 2.2 and 2.3. 

These results confirm the difficulties in obtaining a clear conclusion con- 
cerning the nature of the tail behavior of the distributions of returns. In 
particular, it seems impossible to exclude a rapidly varying behavior of their 
tails. Even the estimations obtained with the maximum likelihood of the GPD 
tail index do not allow one to reject clearly the hypothesis that the tails of the 
empirical distributions of returns are rapidly varying, in particular for large 
quantile values. For the Nasdaq data set, accounting for the lunch effect does 
not yield any significant change in the estimations. 


2.4 Fitting Distributions of Returns 
with Parametric Densities 


Since it is particularly difficult to conclude with enough certainty on the reg- 
ularly or rapidly varying behavior of the tails of distributions of asset returns 
by using the nonparametric methods of the extreme value theory, it may be 
more appropriate to consider a parametric approach. However, in order to 
avoid — or at least to lower — the risk of misspecification inherent in any para- 
metric approach, it is mandatory to use models as versatile as possible. In 
particular, it is necessary to consider models which encompass both regularly 
and rapidly varying distributions. Many examples of such models have been 
described in the literature, such as the generalized t-distribution of McDonald 
and Newey [353] or of the g-exponential and g-Weibull distributions [385]. 

In the remaining of this section, relying on the results presented in [330], 
we introduce two versatile families to characterize the behavior of the tails of 
asset return distributions. The implications of the choice of these parametric 
families for the assessment of tail risk will be discussed at the end of this 
chapter. 


2.4.1 Definition of Two Parametric Families 


A General 3-Parameters Family of Distributions 


We consider a general 3-parameters family of distributions and its particular 
restrictions corresponding to some fixed value(s) of one (two) parameters. This 
family is defined by its density function given by: 
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Table 2.2. Mean values and standard deviations of the maximum likelihood esti- 
mates of the parameter € for the distribution of the maximum (cf. (2.6)) when data 
are grouped in samples of size 20, 40,200, and 400 and for the generalized pareto 
distribution (2.15) for thresholds u corresponding to quantiles 90%, 95%, 99%, and 
99.5% 


(a) Dow Jones 
Positive Tail Negative Tail 

GEV GEV 
cluster 20 40 200 400 cluster 20 40 200 400 
€ 0.273 0.280 0.304 0.322 € 0.262 0.295 0.358 0.349 
Emp Std 0.029 0.039 0.085 0.115 Emp Std 0.030 0.045 0.103 0.143 

GPD GPD 
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995 
é 0.248 0.247 0.174 0.349 € 0.214 0.204 0.250 0.345 


Emp Std 0.036 0.053 0.112 0.194 Emp Std 0.041 0.062 0.156 0.223 
Theor Std 0.032 0.046 0.096 0.156 Theor Std 0.033 0.046 0.108 0.164 


(b) Nasdaq (raw data) 
GEV GEV 
cluster 20 40 200 400 cluster 20 40 200 400 
& 0.209 0.193 0.388 0.516 & 0.191 0.175 0.292 0.307 
Emp Std 0.031 0.115 0.090 0.114 Emp Std 0.030 0.038 0.094 0.162 
GPD GPD 
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995 
& 0.200 0.289 0.389 0.470 fd 0.143 0.202 0.229 0.242 


Emp Std 0.040 0.058 0.120 0.305 Emp Std 0.040 0.057 0.143 0.205 
Theor Std 0.036 0.054 0.131 0.196 Theor Std 0.035 0.052 0.118 0.169 


(c) Nasdaq (corrected data) 
GEV GEV 
cluster 20 40 200 = 400 cluster 20 40 200 400 
g 0.090 0.175 0.266 0.405 € 0.099 0.132 0.138 0.266 
Emp Std 0.029 0.039 0.085 0.187 Emp Std 0.030 0.041 0.079 0.197 
GPD GPD 
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995 
€ 0.209 0.229 0.307 0.344 € 0.165 0.160 0.210 0.054 


Emp Std 0.039 0.052 0.111 0.192 Emp Std 0.039 0.052 0.150 0.209 
Theor Std 0.036 0.052 0.123 0.180 Theor Std 0.036 0.050 0.116 0.143 


Panel (a) gives the results for the Dow Jones index, panel (b) for the raw Nas- 
daq index, and in panel (c) for the Nasdaq index corrected for the “lunch effect.” 
Reproduced from [329] 
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Table 2.3. Pickand’s estimates (2.19) of the parameter € for the generalized Pareto 
distribution (2.15) for thresholds u corresponding to quantiles 90%, 95%, 99% and 
99.5% and two different values of the ratio N/k respectively equal to 4 and 10 


(a) Dow Jones 

Negative Tail Positive Tail 
quantile 0.9 0.95 0.99 0.995 quantile 0.9 0.95 0.99 0.995 
N/k 4 N/k 4 
g 0.2314 0.2944 —0.1115 0.3314 € 0.2419 0.4051 —-0.3752 0.5516 
emp. Std 0.1073 0.1550 0.3897 0.6712 emp. Std 0.0915 0.1274 0.3474 0.5416 
th. Std 0.1176 0.1680 0.3563 0.5344 th. Std 0.1178 0.1712 0.3497 0.5562 
N/k 10 N/k 10 
mean 0.3119 0.0890 —0.3452 0.9413 € 0.3462 0.3215 0.9111 —0.3873 
emp. Std 0.1523 0.2219 0.8294 1.1352 emp. Std 0.1766 0.1929 0.6983 1.6038 
th. Std 0.1883 0.2577 0.5537 0.9549 th. Std 0.1894 0.2668 0.6706 0.7816 
(b) Nasdaq (raw data) 
N/k 4 N/k 4 
€ 0.0493 0.0539 —0.0095 0.4559 € 0.0238 0.1511 0.1745 1.1052 
emp. Std 0.1129 0.1928 0.4393 0.6205 emp. Std 0.1003 0.1599 0.4980 0.6180 
th. Std 0.1147 0.1623 0.3601 0.5462 th. Std 0.1143 0.1644 0.3688 0.6272 
N/k 10 N/k 10 
€ 0.2623 0.1583 —0.8781 0.8855 € 0.2885 0.1435 1.3734 —0.8395 
emp. Std 0.1940 0.3085 0.9126 1.5711 emp. Std 0.2166 0.3220 0.7359 1.5087 
th. Std 0.1868 0.2602 0.5543 0.9430 th. Std 0.1876 0.2596 0.7479 0.7824 
(c) Nasdaq (Corrected data) 
N/k 4 N/k 4 
g 0.2179 0.0265 0.3977 0.1073 € 0.2545 —0.0402 -0.0912 1.3915 
emp. Std 0.1211 0.1491 0.4585 0.7206 emp. Std 0.1082 0.1643 0.4317 0.6220 
th. Std =: 0.1174 0.1617 0.3822 0.5167 th. Std 0.1180 0.1605 0.3570 0.6720 
N/k 10 N/k 10 
€ —0.0878 0.4619 0.0329 0.3742 € 0.0877 0.3907 1.4680 0.1098 
emp. Std 0.1882 0.2728 0.7561 1.1948 emp. Std 0.1935 0.2495 0.8045 1.2345 
th. Std 0.1786 0.2734 0.5722 0.8512 th. Std 0.1822 0.2699 0.7655 0.8172 


Panel (a) gives the results for the Dow Jones, panel (b) for the raw Nasdaq data and 
panel (c) for the Nasdaq corrected for the “lunch effect.” Reproduced from [329] 
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A(b, c,d, u) x~ FY exp [— (3)°] ife>u>0 


2.23 
0 ifx<u. ( ) 


fu(x|b, c,d) = 


Here, b,c, and d are unknown parameters, u is a known lower threshold that 
will be varied for the purposes of analysis and A(b,c,d,u) is a normalizing 
constant given by the expression: 


dc 
I'(—b/e, (u/d)°) ’ 


where I'(a,x) denotes the (non-normalized) incomplete Gamma function: 


A(b,c,d,u) = (2.24) 


ra) = f ite aes (2.25) 


The parameter b ranges from minus infinity to infinity while c and d range 
from zero to infinity. In the particular case where c = 0, the parameter b also 
needs to be positive to ensure the normalization of the probability density 
function. The family (2.23) includes several well-known pdfs often used in 
different applications. We enumerate them. 


1. The Pareto distribution: 
F,(«) =1—(u/z)? , (2.26) 


which corresponds to the set of parameters (b > 0,c = 0) with 
A(b, c,d, u) = b-u®. Several works have attempted to derive or justify the 
existence of a power tail of the distribution of returns from agent-based 
models [91], from optimal trading of large funds with sizes distributed ac- 
cording to the Zipf law, as recalled in Sect. 2.2.2, or from ad hoc stochastic 
processes [55, 445]. 

2. The Weibull distribution: 


F,(x) =1—exp| (5) ! (=) ] (2.27) 


with parameter set (b = —c,c > 0,d > 0) and normalization constant 
A(b,c,d,u) = exp [(4)°]. Recall that this distribution is said to be a 
Stretched-Exponential distribution when the exponent c is smaller than 
one, namely when the distribution decays more slowly than an exponential 
distribution. Stationary distributions exhibiting this kind of tails arise, for 
instance, from the so called a-ARCH processes introduced in [132]. 

From a theoretical viewpoint, this class of distributions is motivated in 
part by the fact that the large deviations of multiplicative processes 
are generically distributed with Stretched-Exponential distributions [191]. 
Stretched-Exponential distributions are also parsimonious examples of the 
important subset of subexponentials, that is, of the general class of distri- 
butions decaying slower than an exponential [487]. This class of subexpo- 
nentials share several important properties of heavy-tailed distributions 
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[146], not shared by exponentials or distributions decreasing faster than 
exponentials: for instance, they have “fat tails” in the sense of the asymp- 
totic probability weight of the maximum compared with the sum of large 
samples [167] (see also [451], Chaps. 1 and 6). 

Notwithstanding their fat-tailness, SE distributions have all their mo- 
ments finite,!® in contrast with regularly varying distributions for which 
moments of order equal to or larger than the tail index b are not defined. 
This property may provide a substantial advantage to exploit in general- 
izations of the mean-variance portfolio theory using higher-order moments 
(see for instance [6, 162, 241, 259, 333, 421, 453] among many others). In 
addition, the existence of all moments is an important property allowing 
for an efficient estimation of any high-order moment, since it ensures that 
the estimators are asymptotically Gaussian. In particular, for Stretched- 
Exponentially distributed random variables, the variance, skewness and 
kurtosis can be accurately estimated, contrarily to random variables with 
regularly varying distribution with tail index in the range 3-5 [356]. 


The Exponential distribution: 
cw 
F.(#)=1— (-= -) 2.28 
(0) =1-ex(-2 +4 (2.28) 
with parameter set (b = —1, c= 1, d > 0) and normalization constant 


A(b,c,d,u) = + exp (—4). For sufficiently high quantiles, the exponential 
behavior can, for instance, derive from the hyperbolic model introduced 
by Eberlein et al. [140] or from a simple model where stock price dynam- 
ics is governed by a diffusion with stochastic volatility. Dragulescu and 
Yakovenko [136] have found an excellent fit of the Dow Jones index for 
time lags from 1 to 250 trading days with a model exhibiting an asymp- 
totic exponential tail of the distribution of log-returns. 


The incomplete Gamma distribution: 
I'(—, «/d) 
F, (2) = 1- ———_ 2.2 
=) =1- FH (2.29) 


with parameter set (b, c = 1, d > 0) and normalization A(b,c,d,u) = 
rw Such an asymptotic tail behavior can, for instance, be observed 
for the generalized hyperbolic models, whose description can be found in 
[393]. 


The Pareto distribution (PD) and Exponential distribution (ED) are 


one-parameter families, whereas the Weibull/Stretched-exponential (SE) and 
the incomplete Gamma distribution (IG) are two-parameter families. The 
comprehensive distribution (CD) given by (2.23) contains three unknown 


parameters. 


13 


However, they do not admit an exponential moment, which leads to problems in 


the reconstruction of the distribution from the knowledge of their moments [465]. 
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Links between these different models reveal themselves under specific as- 
ymptotic conditions. Very interesting is the behavior of the SE model when 
c—Oand u> 0. In this limit, and provided that 


o (F) 3, asc— 0, (2.30) 


where ( is a positive constant, the SE model tends to the Pareto model. 
Indeed, we can write 


poten (-7E*)=e(5) Fpee[- (3) (GQ) - a] 


~ B-a-texp [-c(“)'-In=], as c > 0 


d uU 
~ B-« exp [-8 n=] ; 
U 
B 
w 


which is the pdf of the Pareto model with tail index 3. The condition (2.30) 
comes naturally from the properties of the maximum likelihood estimator of 
the scale parameter d given by (2.B.53) in Appendix 2.B. It implies that, as 
c— 0, the characteristic scale d of the SE model must also go to zero with c 
to ensure the convergence of the SE model toward the Pareto model. 

The Pareto model with exponent ( can therefore be approximated with 
any desired accuracy on any finite interval [u,U], U > u > 0, by the SE 
model with parameters (c,d) satisfying c (3)° = £6 (cf. (2.30), where the ar- 
row is replaced by an equality). Although the value c = 0 does not give, 
strictly speaking, a SE distribution, the limit c —> 0 provides any desired ap- 
proximation to the Pareto distribution, uniformly on any finite interval [u, U]. 
This deep relationship between the SE and PD models allows us to under- 
stand why it can be very difficult to decide, on a statistical basis, which of 
these models fits the data best. 

Another interesting behavior is obtained in the limit b — +c, where the 
Pareto model tends to the exponential model [72]. Indeed, provided that the 
scale parameter u of the power law is simultaneously scaled as u? = (b/a)?, 
we can write the tail of the cumulative distribution function of the PD as 
u?/(u +2)? which is indeed of the form u?/«® for large a. Then, 


u? x 


(uta) (1 or 


—b 
) exp(—ax) for b—-+00. (2.32) 


This shows that the exponential model can be approximated with any desired 
accuracy on intervals [u,u + A] by the PD model with parameters ((, u) 
satisfying u? = (b/a)’, for any positive constant A. Although the value b > 
+oo does not give, strictly speaking, an exponential distribution, the limit u « 
b —=> +00 provides any desired approximation to the exponential distribution, 
uniformly on any finite interval [u,u-+ A]. This limit is thus less general than 
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the SE — PD limit since it is valid only asymptotically for uw — +co, while 
u can be finite in the SE — PD limit. 


The Log-Weibull Family of Distributions 
Another interesting family is the two-parameter log-Weibull family: 

F(x) =1-—exp[—b(In(a/u))] , for r>u. (2.33) 
whose density is 


be (In2)*" exp [-b(In2)], ife>u>0 


, ifa<u. 


fu(a|b, c,d) = (2.34) 


f=) 


This family of pdf interpolates smoothly between the SE and the Pareto 
classes. It recovers the Pareto family for c = 1, in which case the parameter 6 
is the tail exponent. For c larger than one, the tail of the log- Weibull is thinner 
than any Pareto distribution but heavier than any Stretched-Exponential.'* In 
particular, when c equals two, the log-normal distribution is retrieved (above 
threshold u). For c smaller than one, the tails of the log-Weibull distributions 
are even heavier than any regularly varying distribution. It is interesting to 
note that in this case the log-Weibull distributions do not belong to the do- 
main of attraction of a law of the maximum. Therefore, the standard extreme 
values theory cannot apply to such distributions. If it would appear that the 
log- Weibull distributions with an index c < 1 provides a reasonable description 
of the tails of distributions of returns, this would mean that risk management 
methods based upon EVT are particularly unreliable (see below). 


2.4.2 Parameter Estimation Using Maximum Likelihood 
and Anderson-Darling Distance 


It is instructive to fit the two data sets used in Sect. 2.3.4 — 2.e. the Dow Jones 
daily returns and the Nasdaq 5-minute returns — in addition to a sample of 
returns of the Standard & Poor’s 500!° over the two decades 1980-1999 by the 
distributions enumerated above (2.23), (2.26-2.29) and (2.34). We will show 
that no single parametric representation among any of the cited pdfs fits the 
whole range of the data sets. Positive and negative returns will be analyzed 
separately, the later being converted to the positive semi-axis. The analysis 


4 A generalization of the log-Weibull distributions to the following three-parameter 
family also contains the SE family in some formal limit. Consider indeed 1 — 
F(a) = exp(—0(In(1 + 2/D))°) for x > 0, which has the same tail as expression 
(2.33). Taking D — +00 together with b = (D/d)° with d finite yields 1— F(a) = 
exp(—(v/d))*). 

' The returns on the Standard & Poor’s 500 are calculated at five different time 
scales: 1 minute, 5 minutes, 30 minutes, an hour and 1 day. 
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uses a movable lower threshold u, restricting by this threshold the study to 
the observations satisfying the condition x > u. 

In addition to estimating the parameters involved in each representation 
(2.23, 2.26-2.29, 2.34) by maximum likelihood!® for each particular threshold 
u, it is important to characterize the goodness-of-fit. There are many mea- 
sures of goodness-of-fit; a natural class consists in the distances between the 
estimated distribution and the sample distribution. Many distances can be 
used: mean-squared error, Kullback-Leibler distance,!” Kolmogorov distance, 
Sherman distance (as in [312]) or Anderson-Darling distance, to cite a few. 
The parameters of each pdf can also be determined according to the criterion 
of minimizing the distance between the estimated distribution and the sample 
distribution. The chosen distance is thus useful both for characterizing and 
for estimating the parametric pdf. In this case, once an estimation of the pa- 
rameters of a particular distribution family has been obtained according to 
the selected distance, the quantification of the statistical significance of the 
fit requires to derive the statistics associated with the chosen distance. These 
statistics are known for most of the examples cited above, in the limit of large 
sample. 

In addition to the maximum likelihood method (which is associated as we 
said with the Kullback-Leibler distance), it is instructive to use the Anderson- 
Darling distance to estimate the parameters and perform the tests of goodness- 
of-fit. The Anderson-Darling distance between a theoretical distribution func- 
tion F(x) and its empirical analog Fiy(), estimated from a sample of N 
realizations, is defined by 


ADS=N.- i, ae dF (x) (2.35) 


and evaluated as 
N 
=—-N-— 250 {wz log(F(x~)) + (1 — we) log(1 — F(az))} , (2.36) 


where wy = 2k/(2N +1), k = 1...N and a, < ... < “wy is its ordered 
sample. If the sample is drawn from a population with distribution function 
F(a), the Anderson-Darling statistic (ADS) has a standard AD-distribution 
free of the theoretical distribution function F(a) [11], similarly to the x? for 
the x?-statistic, or the Kolmogorov distribution for the Kolmogorov statis- 
tic. It should be noted that the ADS weights [Fiy(x) — F(a)|? in (2.35) by 
N/F(«x)(1 — F(«)) which is nothing but the inverse of its variance. Thus, the 
AD distance emphasizes more the tails of the distribution than, say, the Kol- 
mogorov distance which is determined by the maximum absolute deviation of 


16 The estimators and their asymptotic properties are summarized in Appendix 2.B. 

' This distance (or divergence) is the natural distance associated with the maxi- 
mum likelihood estimation since it is for the maximum likelihood values that the 
distance between the true model and the assumed model reaches its minimum. 
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Fy(«) from F(a) or the mean-squared error, which is mostly controlled by 
the middle range of the distribution. 

Since we have to insert the estimated parameters into the ADS, this statis- 
tic does not obey any more the standard AD-distribution: the ADS decreases 
because the use of the fitting parameters ensures a better fit to the sample dis- 
tribution (we will come back later, with more details, on this topic in Chap. 5). 
However, we can still use the standard quantiles of the AD-distribution as up- 
per boundaries of the ADS. If the observed ADS is larger than the standard 
quantile with a high significance level (1 — ¢), we can then conclude that the 
null hypothesis F(z) is rejected with a significance level larger than (1 -¢). If 
one wishes to estimate the real significance level of the ADS in the case where 
it does not exceed the standard quantile of a high significance level, one is 
forced to use some other method, such as the bootstrap method. 

In the following, the estimates minimizing the Anderson-Darling distance 
will be referred to as AD-estimates. The maximum likelihood estimates (ML- 
estimates) are asymptotically more efficient than AD-estimates for indepen- 
dent data and under the condition that the null hypothesis (given by one of 
the four distributions (2.26—2.29), for instance) corresponds to the true data- 
generating model. When this is not the case, the AD-estimates can provide 
a better practical tool for approximating sample distributions compared with 
the ML-estimates. These estimates will be reported for the thresholds u(q,) 
determined by the probability levels q, = 0, gg = 0.1, gg = 0.2, qa = 0.3, 
qs = 0.4, gg = 0.5, gz = 0.6, gg = 0.7, gg = 0.8, gio = 0.9, qii = 0.925, 
q2 = 0.95, qis = 0.96, gia = 0.97, gis = 0.98, gig = 0.99, giz = 0.9925, 
dis = 0.995, gig = 0.999, goo = 0.9995 and qo; = 0.9999. 

Despite the fact that threshold u(q,) varies from sample to sample, it 
always corresponds to the same fixed probability level q, which allows one 
to compare the goodness-of-fit for samples of different sizes. In the statistics 
presented below, only subsamples with at least 100 data points or so are con- 
sidered, in order to allow for a sufficiently accurate assessment of the quantile 
under consideration. 


2.4.3 Empirical Results on the Goodness-of-Fits 


The Anderson-Darling statistics (ADS) for four parametric distributions 
(Weibull or Stretched-Exponential, Exponential, Pareto and Log-Weibull) are 
shown in Table 2.4 for two quantile ranges, the first top half of the table cor- 
responding to the 90% lowest thresholds while the second bottom half corre- 
sponds to the 10% highest ones. For the lowest thresholds, the ADS rejects 
all distributions at the 95% confidence level, except the SE for the nega- 
tive tail of the Standard & Poor’s 500 for the 60-minute returns and for the 
Nasdaq. Thus, none of the considered distributions is adequate to model the 
data over such large ranges. For the 10% highest quantiles, the exponential 
model is rejected at the 95% confidence level except for the negative tails of 
the Dow Jones (daily returns) and the Nasdaq. The Log-Weibull and the SE 


Table 2.4. Mean Anderson-Darling distances in the range of thresholds u(qi)—u(q9) (90% lowest thresholds) and in the range 
u(q) > u(qio) (10% highest thresholds) 


Mean AD-statistics for u1-ug 


S&P 500 1 min S&P 500 5 min S&P 500 30 min 
Pos. tail Neg. tail Pos. tail Neg. tail Pos. tail Neg. tail 
Weibull 292.85 (100%) 299.46 (100%) 36.62 (100%) 41.04 (100%) 7.36 (100%) 4.84 (100%) 
Exponential 771.70 (100%) 718.56 (100%) 86.79 (100%) 108.17 (100%) 17.47 (100%) 16.36 (100%) 
Pareto 23998.94 (100%) 23337.60 (100%) 6834.06 (100%) 6563.26 (100%) 1847.40 (100%) 1298.47 (100%) 
Log-Weibull 1559.70 (100%) 1470.11 (100%) 360.18 (100%) 331.45 (100%) 60.03 (100%) 67.22 (100%) 
Mean AD-statistics for u; > uio 
Weibull 6.80 100%) 5.80 (100%) 1.81 (88%) 1.93 (90%) 0.67 (42%) 0.79 (51%) 
Exponential 143.97 (100%) 136.66 (100%) 28.12 (100%) 30.88 (100%) 8.19 (100%) 9.75 (100%) 
Pareto 19.97 100%) 19.24 (100%) 8.10 (100%) 7.61 (100%) 1.63 (85%) 1.77 (88%) 
Log-Weibull 3.60 (99%) 4.10 (99%) 1.20 (73%) 1.55 (84%) 0.64 (39%) 0.42 (17%) 
Mean AD-statistics for u1-u9 
S&P 500 60 min Nasdaq Dow Jones 
Pos. tail Neg. tail Pos. tail Neg. tail Pos. tail Neg. tail 
Weibull 3.58 (99%) 2.36 (94%) 1.37 (80%) 0.85 (55%) 4.96 (100%) 3.86 (99%) 
Exponential 8.12 (100%) 12.20 (100%) 5.41 (100%) 3.33 (98%) 16.48 (100%) 10.30 (100%) 
Pareto 1001.68 (100%) 702.47 (100%) 475.00 (100%) 441.40 (100%) 691.30 (100%) 607.30 (100%) 
Log-Weibull 34.44 (100%) 36.55 (100%) 35.90 (100%) 30.92 (100%) 32.30 (100%) 28.27 (100%) 
Mean AD-statistics for u; > ujo 
Weibull 0.66 (41%) 0.68 (42%) 0.67 (42%) 0.50 (29%) 0.38 (13%) 0.35 (10%) 
Exponential 4.99 (100%) 4.89 (100%) 3.06 (97%) 1.97 (90%) 3.06 (97%) 1.89 (89%) 
Pareto lel 2 (70%) 1.28 (76%) 1.30 (78%) 1.33 (78%) 0.78 (50%) 1.26 (75%) 
Log-Weibull 0.48 (23%) 0.57 (32%) 0.46 (29%) 0.49 (30%) 0.38 (13%) 0.69 (43%) 


The figures within parenthesis characterize the goodness of fit: they represent the significance levels with which the considered 
model can be rejected. Note that these significance levels are only lower bounds since one or two parameters are fitted. Reproduced 
from [330] 
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distributions are the best since they are only rejected at the 1-minute time 
scale for the Standard & Poor’s 500. The Pareto distribution provides a reli- 
able description for time scales larger than or equal to 30 minutes. However, 
it remains less accurate than the log-Weibull and the SE distributions, on 
average. Overall, it can be noted that the Nasdaq and the 60-minute returns 
of the Standard & Poor’s 500 behave very similarly. Let us now analyze each 
distribution in more detail. 


Pareto Distribution 


Figures 2.3 and 2.6 show the complementary sample distribution functions 
1 — F(x) for the Standard & Poor’s 500 index at the 30-minute time scale 
and for the daily Dow Jones Industrial Average index, respectively. In Fig. 2.6, 
the mismatch between the Pareto distribution and the data can be seen with 
the naked eye: even in the tails, one observes a continuous downward curvature 
in the double logarithmic diagram, instead of a straight line as would be the 
case if the distribution ultimately behaved like a Pareto law. To formalize this 
impression, we calculate the ML and AD estimators for each threshold u. For 
the Pareto law, the ML estimator is well known to agree with Hill’s estimator. 
Indeed, denoting x; >... > &yn,, the ordered subsample of values exceeding wu 
where JN, is the size of this subsample, the Hill maximum likelihood estimate 
of the parameter 0 is [233] 


10° r 


Complementary sample DF 


1 o> Ll L 1 1 
10° 1074 10°° 10° 1071 10° 


Absolute log—return, x 


Fig. 2.6. Complementary sample distribution function for the daily returns of the 
Dow Jones index over the time period from 1900-2000. The plain (resp. dotted) line 
shows the complementary distribution for the positive (resp. the absolute value of 
negative) returns. Reproduced from [330] 


2.4 Fitting Distributions of Returns with Parametric Densities 65 


; 


5h 


ES 


Hill estimate b 
Hill’s estimate b 
~ 2 


10° 10° 10" 10” 10° 10° 10” 
Lower threshold u Lower threshold u 


Fig. 2.7. Hill estimate by as a function of the threshold u for the Dow Jones (left 
panel) and for the Standard & Poor’s 1-minute returns(right panel) 


(2.37) 


Its standard deviation can be asymptotically estimated as 


under the assumption of iid data, but very severely underestimate the true 
standard deviation when samples exhibit dependence, as reported by Kearns 
and Pagan [267] (see the previous section of this chapter). 

Figure 2.7 shows the Hill estimates b,, as a function of u for the Dow Jones 
and for the Standard & Poor’s 500 1-minute returns. Instead of an approx- 
imately constant exponent (as would be the case for true Pareto samples), 
the tail index estimator, for the Dow Jones, increases until u = 0.04, beyond 
which it seems to slow its growth and oscillates around a value + 3 — 4 up to 
the threshold u & .08. It should be noted that the interval [0,0.04] contains 
99.12% of the sample whereas the interval [0.04, 0.08] contains only 0.64% of 
the sample. The behavior of 6, is very similar for the Nasdaq (not shown). 
The behavior of bu for the Standard & Poor’s 500 shown on the right panel of 
Fig. 2.7 is somewhat different: Hill’s estimate b,, slows its growth at u = 0.006, 
corresponding to the 95% quantile, then decays until u = 0.05 (99.99% quan- 
tile) and then strongly increases again. Are these slowdowns of the growth 
of by, genuine signatures of a possible constant well-defined asymptotic value 
that would qualify a regularly varying function? 

To answer this question, let us have a look at Fig. 2.8 which shows the Hill 
estimator 6, for all data sets (positive and negative branches of the distribu- 
tion of returns for the Dow Jones, the Nasdaq and the Standard & Poor’s 500 
(SP)) as a function of the index n = 1,2,...,18 of the quantiles or standard 
significance levels q1,...,@1g- Similar results are obtained with the AD esti- 
mates. The three branches of the distribution of returns for the Dow Jones 
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Fig. 2.8. Hill estimator b, for all sets (positive and negative branches of the 
distribution of returns for the Dow Jonesc (DJ), Nasdaq (ND) and Standard & 
Poor’s 500 (SP)) as a function of the index n = 1,...,18 of the 18 quantiles or 
standard significance levels qi,...,q1s given in Table 6.3. The two thick lines (in red) 
show the 95% confidence bounds obtained from synthetic time series of 10000 data 
points generated with a Student distribution with exponent b = 3.5. Reproduced 
from [330] 


and the negative tail of the Nasdaq suggest a continuous growth of the Hill 
estimator 6, as a function of n = 1,...,18. However, it turns out that this 
apparent growth may be explained solely on the basis of statistical fluctua- 
tions and slow convergence to a moderate b-value. Indeed, the two thick lines 
show the 95% confidence bounds obtained from synthetic time series of 10000 
data points generated with a Student distribution with exponent b = 3.5. It is 
clear that the growth of the upper bound can explain the observed behavior 
of the b-value obtained for the Dow Jones and Nasdaq data. It would thus be 
incorrect to extrapolate this apparent growth of the b-value. However, con- 
versely, we cannot conclude with certainty that the growth of the b-value has 
been exhausted and that we have access to the asymptotic value. Indeed, large 
values of tail indices are for instance predicted by traditional GARCH models 
giving b~ 10-20 [153, 463]. 
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We now present the results of the fits of the same data with the SE distribution 
(2.27). The corresponding Anderson-Darling statistics (ADS) are shown in 
Table 2.4. The ML-estimates and AD-estimates of the form parameter c are 
represented in Table 2.5. Table 2.4 shows that, for the highest quantiles, the 
ADS for the SE is the smallest of all ADS, suggesting that the SE is the 
best model of all. Moreover, for the lowest quantiles, it is the sole model not 
systematically rejected at the 95% level. 

The c-estimates are found to decrease when increasing the order q of the 
threshold u(q) beyond which the estimations are performed. In addition, sev- 
eral c-estimates are found very close to zero. However, this does not auto- 
matically imply that the SE model is not the correct model for the data even 
for these highest quantiles. Indeed, numerical simulations show that, even 
for synthetic samples drawn from genuine SE distributions with exponent c 
smaller than 0.5 and whose size is comparable with that of our data, in about 
one case out of three (depending on the exact value of c) the estimated value 
of cis zero. This a priori surprising result comes from condition (2.B.57) in 
Appendix 2.B which is not fulfilled with certainty even for samples drawn for 
SE distributions. 

Notwithstanding this cautionary remark, note that the c-estimate of the 
positive tail of the Nasdaq data equals zero for all quantiles higher than qi4 = 
0.97%. In fact, in every case, the estimated c is not significantly different 
from zero — at the 95% significance level — for quantiles higher than qy2-q14, 
except for quantile qo; of the negative tail of the Standard & Poor’s 500, 
but this value is probably doubtful. In addition, the values of the estimated 
scale parameter d, not reported here, are found very small, particularly for 
the Nasdaq — beyond qi2 = 95% — and the S&P 500 — beyond qin = 90%. In 
contrast, the Dow Jones keeps significant scale factors until q16—q17. 

These evidences taken all together provide a clear indication on the exis- 
tence of a change of behavior of the true pdf of these distributions: while the 
bulks of the distributions seem rather well approximated by a SE model, a 
distribution with a tail fatter than that of the SE model is required for the 
highest quantiles. Actually, the fact that both c and d are extremely small may 
be interpreted according to the asymptotic correspondence given by (2.30) and 
(2.31) as the existence of a possible power law tail. 

At this stage, we can state the following conservative statement: the true 
distribution of returns is probably bracketed by a power law, as a lower bound 
and a SE as an upper bound. It is therefore particularly interesting to focus 
on distributions such as log-Weibull distributions which interpolate between 
these two classes in order to obtain — hopefully — a better description of the 
data. 


Table 2.5. Maximum likelihood (MLE) and Anderson-Darling (ADE) estimates of the form parameter c of the Weibull (Stretched- 
Exponential) distribution 


S&P 500 (1 min) Nasdaq Dow Jones 
Pos. tail Neg. tail Pos. tail Neg. tail Pos. tail Neg. tail 

MLE ADE MLE ADE MLE ADE MLE ADE MLE ADE MLE ADE 
q: 1.065 (0.001) 1.175 1.051 (0.001) 1.158 1.007 (0.008) 1.053 0.987 (0.008) 1.017 1.040 (0.007) 1.104 0.975 (0.007) 1.026 
qo 0.927 (0.002) 1.049 0.915 (0.002) 1.035 0.983 (0.011) 1.051 0.953 (0.011) 0.993 0.973 (0.010) 1.075 0.910 (0.010) 0.989 
q3_ 0.8754 (0.002) 1.0196 0.8634 (0.002) 1.0027 0.944 (0.014) 1.031 0.912 (0.014) 0.955 0.931 (0.013) 1.064 0.856 (0.012) 0.948 
qa 0.813 (0.002) 0.970 0.799 (0.002) 0.947 0.896 (0.018) 0.995 0.876 (0.018) 0.916 0.878 (0.015) 1.038 0.821 (0.015) 0.933 
qs 0.763 (0.003) 0.952 0.752 (0.003) 0.932 0.857 (0.021) 0.978 0.861 (0.021) 0.912 0.792 (0.019) 0.955 0.767 (0.018) 0.889 
qe 0.733 (0.003) 0.985 0.727 (0.003) 0.971 0.790 (0.026) 0.916 0.833 (0.026) 0.891 0.708 (0.023) 0.873 0.698 (0.022) 0.819 
q7 0.593 (0.004) 0.799 0.590 (0.004) 0.791 0.732 (0.033) 0.882 0.796 (0.033) 0.859 0.622 (0.028) 0.788 0.612 (0.028) 0.713 
gs 0.504 (0.005) 0.740 0.502 (0.005) 0.730 0.661 (0.042) 0.846 0.756 (0.042) 0.834 0.480 (0.035) 0.586 0.531 (0.035) 0.597 
qo 0.337 (0.007) 0.537 0.342 (0.007) 0.531 0.509 (0.058) 0.676 0.715 (0.059) 0.865 0.394 (0.047) 0.461 0.478 (0.047) 0.527 
qio 0.152 (0.010) 0.394 0.159 (0.010) 0.387 0.359 (0.092) 0.631 0.522 (0.099) 0.688 0.304 (0.074) 0.346 0.403 (0.076) 0.387 
qi1 0.079 (0.012) 0.327 0.091 (0.012) 0.339 0.252 (0.110) 0.515 0.481 (0.120) 0.697 0.231 (0.087) 0.158 0.379 (0.091) 0.337 
qi2 <10-° 0.151 < 1078 0.169 0.039 (0.138) 0.177 0.273 (0.155) 0.275 0.269 (0.111) 0.207 0.357 (0.119) 0.288 
q13 < 1078 0.0793 < 1078 0.084 0.057 (0.155) 0.233 0.255 (0.177) 0.274 0.253 (0.127) 0.147 0.428 (0.136) 0.465 
q4 <10-° 0.008 < 10-8 0.020 < 10-8 O 0.215 (0.209) 0.194 0.290 (0.150) 0.174 0.448 (0.164) 0.641 
qis <10-° 0.008 < 10-8 0.008 < 10-8 0 0.103 (0.260) 0 0.379 (0.192) 0.407 0.451 (0.210) 0.863 
q16 < 1078 0.008 < 1078 0.008 9.6 x 10-8 0 0.064 (0.390) os 0.398 (0.290) 0.382 0.022 (0.319) 0.110 
qi7 <10-° 0.008 < 10-8 0.008 < 10-8 O 0.158 (0.452) 0 0.307 (0.346) 0.255 0.178 (0.367) 0.703 
ms <10° 0.008 < 107° 0.008 < 107" 0 <10-* a 2 x10-* 0 < 107° 0 
gig 0.035 (0.082) 0.007 0.009 (0.032) 0.007 = 7 — = = — = 7 
geo 0.111 (0.119) 0.075 0.316 (0.117) 0.007 — - = — = — - - 
q21 <1078 0.008 0.827 (0.393) 0.900 - = = — = — 7 - 


Reproduced from [330] 
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Log- Weibull Distributions 


The parameters b and c of the log-Weibull distribution defined by (2.33) are 
estimated with both the maximum likelihood and Anderson-Darling methods 
for the 18 standard significance levels q1,...,qis (given on page 62) for the 
Dow Jones and Nasdaq data and up to qo; for the Standard & Poor’s 500 
data. The results for the Dow Jones and the Standard & Poor’s 500 are 
given in Table 2.6. For both positive and negative tails of the Dow Jones, the 
results are very stable for all quantiles lower than qig: c = 1.09 + 0.02 and 
b = 2.71 £0.07. These results reject the Pareto distribution degeneracy c = 1 
at the 95% confidence level. Only for the quantiles higher than or equal to 
dig, an estimated value c compatible with the Pareto distribution is found. 
Moreover both for the positive and negative Dow Jones tails, one finds that 
c © 0.92 and b & 3.6—3.8, suggesting either a possible change of regime or 
a sensitivity to “outliers” or a lack of robustness due to a too small sample 
size. For the positive Nasdaq tail, the exponent c is found compatible with 
c = 1 (the Pareto value), at the 95% significance level, above qi; while b 
remains almost stable at b ~ 3.2. For the negative Nasdaq tail, we find that c 
decreases almost systematically from 1.1 for gio to 1 for qig for both estimators 
while b regularly increases from about 3.1 to about 4.2. The Anderson-Darling 
distances are significantly better than for the SE and this statistics cannot be 
used to conclude neither in favor of nor against the log-Weibull class. 

The situation is different for the Standard & Poor’s 500 (1-min). For 
the positive tail, the parameter c remains significantly smaller than 1 from 
qia = 97% to qo1 except for gig and gag. Therefore, it seems that for very small 
time scales, the tails of the distribution of returns might be even fatter than a 
power law. As stressed in Sect. 2.4.1, when c is less than one, the log-Weibull 
distribution does not belong to the domain of attraction of a law of the max- 
imum. As a consequence, EVT cannot provide reliable results when applied 
to such data, neither from a theoretical point of view nor from a practical 
stance (e.g. extreme risk assessment). The conclusions are the same for the 
5-minute time scale. For the 30-minute and 60-minute time scales, c remains 
systematically less than one for the highest quantiles but this difference ceases 
to be significant. In the negative tail, the situation is overall the same. 


2.4.4 Comparison of the Descriptive Power 
of the Different Families 


The previous sections have shown that none of the considered distributions 
(2.26-2.29) and (2.34) fit the data over the entire range, which is not a surprise. 
For the highest quantiles, several models seem to be able to represent the data, 
including the Pareto model, the SE model and the log-Weibull model discussed 
above. The last two models seem to be the most reasonable models among 
the models compatible with the data. For all the samples, their Anderson- 
Darling statistics remain so close to each other for the highest quantiles that 
the descriptive power of these two models cannot be distinguished. 
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Table 2.6. Maximum likelihood (MLE) and Anderson-Darling (ADE) estimates 
of the parameters b and c of the log-Weibull distribution defined by (2.33) 


Dow Jones (1 day) Positive tail Dow Jones (1 day) Negative tail 
MLE ADE MLE ADE 
c b c b c b c b 

qi 5.262 (0.005) 0.000 (0.000) 5.55 0.000 5.085 (0.005) 0.000 (0.000) 5.320 0.000 
q2 2.140 (0.009) 0.241 (0.002) 2.25 0.220 2.125 (0.009) 0.211 (0.002) 2.240 0.191 
g3_ 1.790 (0.010) 0.531 (0.005) 1.87 0.510 1.751 (0.010) 0.495 (0.005) 1.800 0.481 
qa 1.616 (0.012) 0.830 (0.008) 1.65 0.820 1.593 (0.012) 0.744 (0.008) 1.630 0.735 
qs 1.447 (0.012) 1.165 (0.012) 1.47 1.160 1.459 (0.013) 1.022 (0.011) 1.480 1.015 
qe 1.339 (0.012) 1.472 (0.017) 1.36 1.473 1.353 (0.013) 1.311 (0.016) 1.370 1.311 
q7 1.259 (0.013) 1.768 (0.023) 1.28 1.773 1.269 (0.014) 1.609 (0.022) 1.270 1.610 
qs 1.173 (0.013) 2.097 (0.031) 1.17 2.096 1.188 (0.015) 1.885 (0.030) 1.190 1.887 
qo 1.125 (0.015) 2.362 (0.043) 1.12 2.358 1.158 (0.017) 2.178 (0.042) 1.150 2.174 
qio 1.090 (0.020) 2.705 (0.070) 1.08 2.695 1.087 (0.022) 2.545 (0.069) 1.090 2.545 

) ) (0.024) (0.085) 

) ) (0.029) (0.111) 

) ) (0.032) (0.125) 

) ) (0.038) (0.158) 

) ) (0.047) (0.209) 

) ) (0.056) (0.322) 

) ) (0.069) ) 


qi1 1.035 (0.022) 2.771 (0.083) 1.03 2.762 1.074 (0.024) 2.688 (0.085) 1.070 2.681 
qi2 1.047 (0.027) 2.867 (0.105) 1.04 2.857 1.068 (0.029) 2.880 (0.111) 1.050 2.857 
qiz_ 1.046 (0.030) 2.960 (0.121) 1.03 2.933 1.067 (0.032) 2.900 (0.125) 1.080 2.924 
qia 1.044 (0.034) 3.000 (0.142) 1.03 2.976 1.132 (0.038) 3.171 (0.158) 1.120 3.155 
qis 1.090 (0.043) 3.174 (0.184) 1.09 3.165 1.163 (0.047) 3.439 (0.209) 1.180 3.472 
die 1.085 (0.059) 3.424 (0.280) 1.09 3.425 1.025 (0.056) 3.745 (0.322) 1.010 3.731 
qiz7 1.093 (0.066) 3.666 (0.345) 1.09 3.650 1.108 (0.069) 3.822 (0.380) 1.120 3.891 
qis 0.935 (0.071) 3.556 (0.411) 0.902 3.484 0.921 (0.071) 3.804 (0.461) 0.933 3.846 
S&P 500 (1 min) Positive tail S&P 500 (1 min) Negative tail 
MLE ADE MLE ADE 
c b c b c c b 

qi 3.261 (0.003) 0.029 (0.000) 3.298 0.027 3.232 ( ) 7 ) 3.264 0.028 
q2 1.875 (0.002) 0.433 (0.001) 1.878 0.410 1.884 ( ) ( ) 1.881 0.399 
q3_ 1.645 (0.002) 0.723 (0.001) 1.642 0.690 1.647 ( ) ( ) 1.641 0.676 
qa 1.471 (0.002) 1.017 (0.001) 1.477 0.970 1.465 ( ) ( ) 1.470 0.954 
qs 1.414 (0.002) 1.277 (0.002) 1.405 1.233 1.411 ( ) ( ) 1.401 1.208 
qe 1.382 (0.002) 1.512 (0.002) 1.387 1.477 1.383 (0.002) (0.002) 1.389 1.443 
q7 1.233 (0.002) 1.862 (0.003) 1.234 1.811 1.232 ( ) ( ) 1.239 1.776 
qs 1.187 (0.002) 2.155 (0.005) 1.192 2.116 1.192 ( ) ( ) 1.196 2.079 
qo 1.112 (0.002) 2.508 (0.007) 1.111 2.470 1.113 (0.002) 2.455 (0.007) 1.112 2.415 
qio 1.069 (0.003) 2.876 (0.011) 1.078 2.896 1.062 (0.003) 2.818 (0.011) 1.074 2.831 
qi1 1.048 (0.003) 2.961 (0.014) 1.066 3.016 1.055 (0.003) 2.927 (0.014) 1.069 2.972 

)3 ) (0.004) (0.017) 

) ) (0.004) (0.020) 

) ) (0.005) (0.023) 

) ) (0.006) (0.027) 

) ) (0.008) (0.037) 

) ) (0.010) (0.041) 

) ) (0.012) (0.046) 

) ) (0.027) (0.080) 

) ) (0.040) (0.107) 

) ) (0.115) (0.455) 


0.003 
0.002 
0.002 
0.002 
0.002 
0.002 
0.002 
0.002 


0.030 
0.420 
0.707 
1.000 
1.251 
1.477 
1.823 
2A 


0.000 
0.001 
0.001 
0.001 
0.002 
0.002 
0.003 
0.005 


qi2 1.016 (0.004) 3.048 (0.018) 1.033 3.123 1.015 (0.004) 3.006 (0.017) 1.034 3.076 
qi3 1.002 (0.004) 3.063 (0.020) 1.021 3.151 1.001 (0.004) 3.033 (0.020) 1.020 3.115 
qia 0.981 (0.005) 3.054 (0.023) 1.003 3.153 0.990 (0.005) 3.033 (0.023) 1.012 3.134 
gis 0.961 (0.006) 3.015 (0.027) 0.985 3.133 0.978 (0.006) 3.004 (0.027) 1.003 3.132 
qis 0.941 (0.008) 2.867 (0.036) 0.961 2.980 0.937 (0.008) 2.871 (0.037) 0.957 2.987 
qiz 0.937 (0.010) 2.798 (0.040) 0.951 2.899 0.927 (0.010) 2.780 (0.041) 0.947 2.887 
qis 0.902 (0.011) 2.649 (0.046) 0.902 2.677 0.925 (0.012) 2.644 (0.046) 0.940 2.726 
qig 0.994 (0.028) 2.256 (0.084) 0.971 2.201 0.962 (0.027) 2.134 (0.080) 0.923 2.063 
q20 0.999 (0.039) 2.245 (0.118) 0.967 2.139 1.011 (0.040) 2.037 (0.107) 0.933 1.879 
q2i 0.949 (0.083) 2.686 (0.330) 0.957 2.801 1.288 (0.115) 3.387 (0.455) 1.234 3.272 


The numbers in parenthesis give the standard deviations of the estimates. Repro- 
duced from [330] 
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One can go further and ask which of these models are sufficient to describe 
the data compared with the comprehensive distribution (2.23) encompassing 
all of them. Here, the four distributions (2.26-2.29) are compared with the 
comprehensive distribution (2.23) using Wilks’ theorem [485] on maximum 
likelihood ratios, which allows to compare nested hypotheses. It will be shown 
that the Pareto and the SE models are the most parsimonious. We then turn 
to a direct comparison of the best two-parameter models (the SE and log- 
Weibull models) with the best one-parameter model (the Pareto model), which 
will require an extension of Wilks’ theorem derived in Appendix 2.D. This 
extension allows us to directly test the SE model against the Pareto model. 


Comparison Between the Four Parametric Families (2.26—2.29) 
and the Comprehensive Distribution (2.23) 
According to Wilks’ theorem, the doubled log-likelihood ratio A: 


max £(C'D, X,O) 
maxLl(z,X,0) ” 


A=2 log (2.39) 


has asymptotically (as the size N of the sample X tends to infinity) the x?- 
distribution. Here £ denotes the likelihood function, 6 and O are parametric 
spaces corresponding to hypotheses z and CD (comprehensive distribution 
defined in (2.23)) correspondingly (hypothesis z is one of the four hypotheses 
(2.26-2.29) that are particular cases of the CD under some parameter re- 
strictions recalled in Sect. 2.4.1). The statement of the theorem is valid under 
the condition that the sample X obeys the hypothesis z for some particular 
value of its parameter belonging to the space 6. The number of degrees of 
freedom of the x?-distribution is equal to the difference of the dimensions 
of the two spaces © and 6. We have dim(@) = 3,dim(@) = 2 for the SE 
and for the incomplete Gamma distributions while dim(6) = 1 for the Pareto 
and the Exponential distributions. This leads to one degree of freedom of the 
x?-distribution for the two former cases and two degrees of freedom of the 
x?-distribution for the later models. The maximum of the likelihood in the 
numerator of (2.39) is taken over the space O, whereas the maximum of the 
likelihood in the denominator of (2.39) is taken over the space 6. Since we 
have always @ C O, the likelihood ratio is always larger than 1, and the log- 
likelihood ratio is non-negative. If the observed value of A does not exceed 
some high-confidence level (say, 99% confidence level) of the y?, we then re- 
ject the hypothesis CD in favor of the hypothesis z, considering the space O 
redundant. Otherwise, we accept the hypothesis CD, considering the space 6 
insufficient. 

The double log-likelihood ratios (2.39) are shown for the positive and neg- 
ative branches of the distribution of returns in Fig. 2.9 for the Nasdaq Com- 
posite index. Similar results (not shown) are obtained for the Dow Jones and 
the Standard & Poor’s 500 (1, 5, 30 and 60 minutes) indices. 
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Wilks statistic (doubled log-likelihood ratio) 


1 45 20 2.5 3 
Lower threshold, u -3 


Wilks statistic (doubled log-likelihood ratio) 


1 15 2 2.5 
Lower threshold, u -3 


Fig. 2.9. Wilks statistic for the comprehensive distribution versus the four para- 


metric distributions: Pareto ( 


), Stretched-Exponential (*«), Exponential, (0) and 


incomplete Gamma (V) for the Nasdaq 5-minute returns. The upper (lower) panel 
refers to the positive (negative) returns. The horizontal lines represent the critical 
values at the 95% confidence level of the test for the y?-distribution with one (lower 
line) and two (upper line) degrees of freedom. Reproduced from [330] 
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For the Nasdaq data, Figure 2.9 clearly shows that the Exponential dis- 
tribution is completely insufficient: for all lower thresholds, the Wilks log- 
likelihood ratio exceeds the critical value corresponding to the 95% level of 
the x7 function. The Pareto distribution is insufficient for thresholds corre- 
sponding to quantiles less than qi; = 92.5% and becomes comparable with the 
comprehensive distribution beyond. It is natural that the families with two 
parameters, the incomplete Gamma and the SE, have higher goodness-of-fit 
than the one-parameter Exponential and Pareto distributions. The incomplete 
Gamma distribution is comparable with the comprehensive distribution be- 
yond quantile qjg = 90%, whereas the SE is somewhat better beyond quantile 
qg = 70%. For the tails representing 7.5% of the data, all parametric families 
except for the Exponential distribution fit the sample distribution with almost 
the same efficiency according to this test. 

The results obtained for the Dow Jones data are similar. The SE is com- 
parable with the comprehensive distribution starting with gg = 70%. On the 
whole, one can say that the SE distribution performs better than the three 
other parametric families. 

The situation is somewhat different for the Standard & Poor’s 500 index. 
For the positive tail, none of the four distributions is really sufficient in order 
to accurately describe the data. The comprehensive distribution is overall 
the best. In the negative tail, we retrieve a behavior more similar to that 
observed in the two previous cases, except for the Exponential distribution 
which also appears to be better than the comprehensive distribution. However, 
it should be noted that the comprehensive distribution is only rejected in the 
very far tail. The four models (2.26—2.29) are better than the comprehensive 
distribution only for the two highest quantiles (q29 and qo1) of the negative 
tail. In contrast, the Pareto, SE and incomplete Gamma models are better 
than the comprehensive distribution over the 10 highest quantiles (or so) for 
the Nasdaq and the Dow Jones. 

We should stress again that each log-likelihood ratio, so-to say “acts on its 
own ground” that is, the corresponding y?-distribution is valid under the as- 
sumption of the validity of each particular hypothesis whose likelihood stands 
in the numerator of the double log-likelihood (2.39). It would be desirable to 
compare all combinations of pairs of hypotheses directly, in addition to com- 
paring each of them with the comprehensive distribution. Unfortunately, the 
Wilks theorem cannot be used in the case of pair-wise comparison because the 
problem is no more that of comparing nested hypothesis (i.e., one hypothe- 
sis is a particular case of the comprehensive model). As a consequence, the 
previous results on the comparison of the relative merits of each of the four 
distributions using the generalized log-likelihood ratio should be interpreted 
with care, in particular, in a case of contradictory conclusions. Fortunately, the 
main conclusion of the comparison (an advantage of the SE distribution over 
the three other distributions) does not contradict the earlier results discussed 
above. 
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Pair-Wise Comparison of the Pareto Model 
with the Stretched-Exponential and Log-Weibull Models 


Let us compare formally the descriptive power of the SE distribution and the 
log-Weibull distribution (the two best two-parameter models qualified until 
now) with that of the Pareto distribution (the best one-parameter model). 
For the comparison of the log-Weibull model versus the Pareto model, Wilks’ 
theorem can still be applied since the log-Weibull distribution encompasses 
the Pareto distribution. A contrario, the comparison of the SE versus the 
Pareto distribution should in principle require that we use the methods for 
testing non-nested hypotheses [209], such as the Wald encompassing test or 
the Bayes factors [266]. Indeed, the Pareto model and the (SE) model are 
not, strictly speaking, nested. However, as shown in Sect. 2.4.1, the Pareto 
distribution is a limited case of the SE distribution, as the fractional exponent 
c goes to zero. Changing the parametric representation of the (SE) model into 


f(alb,c) =b u~* 2°" exp -? (=) - :)| , £>Uu, (2.40) 


U 


i.e., setting b = c- (4)°, where the parameter d refers to the former (SE) 
representation (2.27), Appendix 2.D shows that the doubled log-likelihood 
ratio 


MaxXyp ¢ Lsgr 


W = 2log (2.41) 


maxp Lpp 
still follows Wilks’ statistic, namely is asymptotically distributed according to 
a x?-distribution, with one degree of freedom in the present case. Thus, even 
in this case of non-nested hypotheses, Wilks’ statistic still allows us to test 
the null hypothesis Ho according to which the Pareto model is sufficient to 
describe the data. 

Concerning the comparison between the Pareto model and the SE one, the 
null hypothesis is found to be more often rejected for the Dow Jones than for 
the Nasdaq and the Standard & Poor’s 500 [330]. Indeed, beyond the quantile 
qi2 = 95%, the Pareto model cannot be rejected in favor of the SE model at 
the 95% confidence level for the Nasdaq and the Standard & Poor’s 500 data. 
For the Dow Jones, one must consider quantiles higher than qig = 99% — at 
least for the negative tail — in order not to reject Ho at the 95% significance 
level. These results are in qualitative agreement with what we could expect 
from the action of the central limit theorem: the power law regime (if it really 
exists) is pushed back to higher quantiles due to time aggregation (recall that 
the Dow Jones data is at the daily scale while the Nasdaq data is at the 
5-minute time scale). 

It is, however, more difficult to rationalize the fact reported in [330] that 
the SE model is not rejected (at the 99% confidence level) for the two highest 
quantiles (q2o = 99.95% and q2i = 99.99%) of the negative tail of the 1 minute 


2.4 Fitting Distributions of Returns with Parametric Densities 75 


returns of the Standard & Poor’s 500 and for the quantiles qi9 = 99.9% and 
q2o = 99.95% for its positive tail. This might be ascribed to a lack of power 
of the test, but recall that we have restricted our investigation to empirical 
quantiles with more than a hundred points (or so). Therefore, invoking a lack 
of power is not very convincing. In addition, for these high quantiles, the 
fractional exponent c in the SE model becomes significantly different from 
zero (see Table 2.5). It could be an empirical illustration of the existence of 
a cut-off beyond which the power law regime is replaced by an exponential 
(or stretched-exponential) decay of the distribution function as suggested by 
Mantegna and Stanley [344] and by the recent model [493] based upon a pure 
jump Lévy process, whose jump arrival rate obeys a power law dampened 
by an exponential function. To strengthen this idea, it can be noted that the 
exponential distribution is found sufficient to describe the distributions of the 
1 minute returns of the Standard & Poor’s 500, while it is always rejected 
(with respect to the comprehensive distribution) for the Nasdaq and the Dow 
Jones. Thus, this non-rejection could really be the genuine signature of a cut- 
off beyond which the decay of the distribution is faster than any power law. 
However, this conclusion is only drawn from the one hundred most extreme 
data points and, therefore, should be considered with caution. Larger samples 
should be considered to obtain a confirmation of this intuition. Unfortunately, 
samples with more than 10 million (non zero) data points (for a single asset) 
are not yet accessible. 

Based upon the study of [330], Wilks’ test for the Pareto distribution 
versus the log-Weibull distribution shows that, for quantiles above qi2, the 
Pareto distribution cannot be rejected in favor of the log-Weibull for the Dow 
Jones, the Nasdaq and the Standard & Poor’s 500 30-minute returns. This 
parallels the lack of rejection of the Pareto distribution against the SE beyond 
the significance level q:2. The picture is different for the 1-minute returns of 
the Standard & Poor’s 500. The Pareto model is almost always rejected. The 
most interesting point is the following: in the negative tail, the Pareto model 
is always strongly rejected except for the highest quantiles. Comparing with 
Table 2.6, one clearly sees that between qi5 and qig the exponent c is signif- 
icantly (at the 95% significance level) less than one, indicating a tail fatter 
than any power law. On the contrary, for q21, the exponent c is found signif- 
icantly larger than one, indicating a change of regime and again an ultimate 
decay of the tail of the distribution faster than any power law. 

In summary, the null hypothesis that the true distribution is the Pareto 
distribution is strongly rejected until quantiles 90-95% or so. Thus, within 
this range, the Stretched-Exponential and log-Weibull models seem the best 
and the Pareto model is insufficient to describe the data. But, for the very 
highest quantiles (above 95%-98%), one cannot reject any more the hypothe- 
sis that the Pareto model is sufficient compared with the SE and log-Weibull 
models. These two parameter models can then be seen as a redundant para- 
meterization for the extremes compared with the Pareto distribution, except 
for the returns calculated at the smallest time scales. 
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2.5 Discussion and Conclusions 


2.5.1 Summary 


This chapter has revisited the generally accepted fact that the tails of the 
distributions of returns present a power-like behavior. Often, the conviction 
of the existence of a power-like tail is based on the Gnedenko theorem stating 
the existence of only three possible types of limit distributions of normalized 
maxima (a finite maximum value, an exponential tail, and a power-like tail) 
together with the exclusion of the first two types by empirical evidence. The 
power-like character of the tails of the distribution of log-returns follows then 
simply from the power-like distribution of maxima. However, in this chain 
of arguments, the conditions needed for the fulfillment of the correspond- 
ing mathematical theorems are often omitted and not discussed properly. In 
addition, widely used arguments in favor of power law tails invoke the self- 
similarity of the data but are often assumptions rather than experimental 
evidence or consequences of economic and financial laws. 

Sharpening and generalizing the results obtained by Kearns and Pagan 
[267], Sect. 2.3.3 has recalled that standard statistical estimators of heavy 
tails are much less efficient than often assumed and cannot in general clearly 
distinguish between a power law tail and a SE tail (even in the absence of 
long-range dependence in the volatility). So, in view of the stalemate reached 
with the nonparametric approaches and in particular with the standard ex- 
treme value estimators, resorting to a parametric approach appears essential. 
The parametric approach is useful to decide which class of extreme value 
distributions — rapidly versus regularly varying — accounts best for the em- 
pirical distributions of returns at different time scales. However, here again, 
the problem is not as straightforward as its appears. Indeed, in order to apply 
statistical methods to the study of empirical distributions of returns and to de- 
rive their resulting implication for risk management, it is necessary to keep in 
mind the existence of necessary conditions that the empirical data must obey 
for the conclusions of the statistical study to be valid. Maybe the most impor- 
tant condition in order to speak meaningfully about distribution functions is 
the stationarity of the data, a difficult issue that we have barely touched upon 
here. In particular, the importance of regime switching is now well established 
[14, 397] and its possible role should be assessed and accounted for. 


2.5.2 Is There a Best Model of Tails? 


The results that standard statistical estimators of heavy tails are much less 
efficient than often assumed and cannot in general clearly distinguish between 
a power law tail and a SE tail, can be rationalized by the fact that, into a cer- 
tain limit, the Stretched-Exponential pdf tends to the Pareto distribution (see 
(2.30-2.31) and Appendix 2.B). Thus, the Pareto (or power law) distribution 
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can be approximated with any desired accuracy on an arbitrary interval by a 
suitable adjustment of the pair (c,d) of the parameters of the Stretched Ex- 
ponential pdf. The parametric tests presented above indicate that the class of 
SE and log-Weibull distributions provide a significantly better fit to empirical 
returns than the Pareto, the exponential or the incomplete Gamma distribu- 
tions. All these tests are consistent with the conclusion that these two models 
provide the best effective apparent and parsimonious models to account for 
the empirical data on the largest possible range of returns. 

However, this does not mean that the Stretched Exponential or the log- 
Weibull model is the correct description of the tails of empirical distributions 
of returns. Again, as already mentioned, the strength of these models come 
from the fact that they encompass the Pareto model in the tail and offers a 
better description in the bulk of the distribution. To see where the problem 
arises, Table 2.7 summarizes the best ML-estimates for the SE parameters c 
(form parameter) and d (scale parameter) restricted to the quantiles beyond 
qi2 = 95%, which offers a good compromise between a sufficiently large sample 
size and a restricted tail range leading to an accurate approximation in this 
range. 

One can see that c is very small (and all the more so for the scale parameter 
d) for the tail of the distribution of positive returns of the Nasdaq data, 
suggesting a convergence to a power law tail. The exponents c for the negative 
returns of the Nasdaq data and for both positive and negative returns of the 
Dow Jones data are an order of magnitude larger but the statistical tests show 
that they are not incompatible with an asymptotic power tail either. Indeed, 
Sect. 2.4.4 has shown that, for the very highest quantiles (above 95-98%), 
one cannot reject the hypothesis that the Pareto model is sufficient compared 
with the SE model. The values of c and d are even smaller for the Standard 
& Poor’s 500 data both at the 1-minute and 5-minute time scales. 


Table 2.7. Best parameters c and d of the Stretched-Exponential model and best 
parameter b of the Pareto model estimated beyond quantile qi2 = 95% for the Dow 
Jones (DJ), the Nasdaq (ND) and the Standard & Poor’s 500 (SP) indices. The 
apparent Pareto exponent c(u(qi2)/d)° (see expression (2.30)) is also shown 


Sample c d c(u(qi2)/d)° b 
DJ pos. returns 0.274 o.111) 4.81 x 10°° 2.68 2.79 (0.10) 
DJ neg. returns 0.362 (0.119) 1.02 x 1074 2.57 2.77 (0.11) 
ND pos. returns 0.039 (o.1ss) 4.54 x 107%? 3.03 3.23 (0.14) 
ND neg. returns 0.273 (0.155) 1.90 x 1077 3.10 3.35 (0.15) 
SP pos. returns (1min) = = 3.01 3.02 (0.02) 
SP neg returns (1min) = — 2.97 2.97 (0.02) 
SP pos. returns (5min) 0.033 (0.031) 3.06 x 10759 2.95 2.95 (0.03) 


SP neg. returns (5min) 0.033 (0.031) 3.26 x 107°° 2.87 2.86 (0.03) 
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Note also that the exponents c are larger for the daily Dow Jones data 
than for the 5-minute Nasdaq data and the 1-minute and 5-minute Standard 
& Poor’s 500 data, in agreement with an expected (slow) convergence to the 
Gaussian law according to the central limit theory.!° However, a t-test does 
not allow one to reject the hypothesis that the exponents c remain the same for 
the positive and negative tails of the Dow Jones data. This confirms previous 
results, for instance [319, 256] according to which the extreme tails can be 
considered as symmetric, at least for the Dow Jones data. In contrast, there is 
a very strong asymmetry for the 5-minute sampled Nasdaq and the Standard 
& Poor’s 500 data. 

These are the evidences in favor of the existence of an asymptotic power 
law tail. Balancing this view, many of the tests have shown that the power 
law model is not as powerful compared with the SE and log-Weibull models, 
even arbitrarily far in the tail (as far as the available data allows us to probe). 
In addition, for the smallest time scales, the tail of the distribution of return 
is, over a large range, well-described by a log-Weibull distribution with an 
exponent c less than one, i.e., is fatter than any power law. A change of 
regime is ultimately observed and the very extreme tail decays faster than any 
power law. Both a SE or a log-Weibull model with exponent c > 1 provide a 
reasonable description. 

Attempting to wrap up the different results obtained by the battery of 
tests presented here, we can offer the following conservative conclusion: it 
seems that the tails of the distributions examined here are decaying faster 
than any (reasonable) power law but slower than any Stretched-Exponentials. 
Maybe log-normal distributions could offer a better effective description of 
the distribution of returns,!? as suggested in [436]. 

In sum, in the most practical case, the Pareto distribution is sufficient 
above quantiles qi2 = 95% but is not stable enough to ascertain with strong 
confidence an asymptotic power law nature of the pdf. 


2.5.3 Implications for Risk Assessment 


The correct description of the distribution of returns has important impli- 
cations for the assessment of large risks not yet sampled by historical time 
series. Indeed, the whole purpose of a characterization of the functional form 
of the distribution of returns is to extrapolate currently available historical 


18 See [453] and Figs. 3.6-3.9 pp. 81-82 of [451] where it is shown that SE distri- 
butions are approximately stable in family and the effect of aggregation can be 
seen to slowly increase the exponent c. See also [137] which studies specifically 
this convergence to a Gaussian law as a function of the time scale. 

19 Let us stress that we are speaking of a log-normal distribution of returns, not of 
price! Indeed, the standard Black and Scholes model of a log-normal distribution 
of prices is equivalent to a Gaussian distribution of returns. Thus, a log-normal 
distribution of returns is much more fat-tailed, and in fact bracketed by power 
law tails and Stretched-Exponential tails. 
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time series beyond the range provided by the empirical reconstruction of the 
distributions. For risk management, the determination of the tail of the dis- 
tribution is crucial. Indeed, many risk measures, such as the Value-at-Risk or 
the expected-shortfall, are based on the properties of the tail of the distrib- 
utions of returns. In order to assess risk at probability levels of 95% or so, 
nonparametric methods have merits. However, in order to estimate risks at 
high probability level such as 99% or larger, nonparametric estimations fail by 
lack of data and parametric models become unavoidable. This shift in strat- 
egy has a cost and replaces sampling errors by model errors. The considered 
distribution can be too thin-tailed as when using normal laws, and risk will be 
underestimated, or it can be too fat-tailed and risk will be overestimated as 
with Lévy law and possibly with regularly varying distributions. In each case, 
large amounts of money are at stake and can be lost due to a too conservative 
or too optimistic risk measurement. 

In order to bypass these problems, many authors [34, 313, 355, among 
others] have proposed to estimate the extreme quantiles of the distributions 
in asemiparametric way, which allows one (i) to avoid the model errors and (ii) 
to limit the sampling errors with respect to nonparametric methods and thus 
to keep a reasonable accuracy in the estimation procedure. For this aim, it has 
been suggested to use the extreme value theory.?° However, as emphasized in 
Sect. 2.3.3, estimates of the parameters of such (GEV or GPD) distributions 
can be very unreliable in the presence of dependence, so that these methods 
finally appear to be not very accurate and one cannot avoid a parametric 
approach for the estimation of the highest quantiles. 

The above analysis suggests that the Paretian paradigm leads to an overes- 
timation of the probability of large events and therefore leads to the adoption 
of too conservative positions. Generalizing to larger time scales, the overly 
pessimistic view of large risks deriving from the Paretian paradigm should be 
all the more revised, due to the action of the central limit theorem. The above 
comparison between several models, which turn out to be almost undistin- 
guishable such as the Stretched-Exponential, the Pareto and the log-Weibull 
distributions, offers the important possibility of developing scenarios that can 
test the sensitivity of risk assessment to errors in the determination of para- 
meters and even more interesting with respect to the choice of models, often 
referred to as model errors. 

Finally, an additional note of caution is in order. This chapter has fo- 
cused on the marginal distributions of returns calculated at fixed time scales 
and thus neglects the possible occurrence of runs of dependencies, such as 
in cumulative drawdowns. In the presence of dependencies between returns, 
and especially if the dependence is nonstationary and increases in time of 
stress, the characterization of the marginal distributions of returns is not suf- 
ficient. As an example, Johansen and Sornette [249] (see also Chap. 3 of [450]) 


0 See, for instance, http: //www.gloriamundi.org for an overview of the extensive 
application of EVT methods for VaR and expected-shortfall estimation. 
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have recently shown that the recurrence time of very large drawdowns cannot 
be predicted from the sole knowledge of the distribution of returns and that 
transient dependence effects occurring in times of stress make very large draw- 
downs more frequent, qualifying them as abnormal “outliers” (other names 
are “kings” or “black swans”). 


Appendix 
2.A Definition and Main Properties of Multifractal Processes 


The traditional description of the dynamics of asset prices initiated by Bache- 
lier [26] was based upon the Brownian motion and then the geometric Brown- 
ian motion [377, 425]. But it is now widely recognized that these descriptions 
suffer from two major discrepancies. As shown in this chapter, the stationary 
distribution of asset returns is far from the Gaussian law (it exhibits fat tails) 
and, in addition, the volatility of asset returns has long range dependence (or 
long memory), which is characterized by the alternation of periods of small 
price changes and periods of large price changes. 

In the mathematical literature, stochastic processes are said to exhibit 
a long memory when their autocovariance function decays hyperbolically 
[85, 213, 222]. Fractionally integrated processes like ARFIMA?! and FI- 
GARCH”? [31] processes are discrete time processes that enjoy this property. 
The first class is not suitable for the modeling of financial assets returns in 
so far as it yields long memory in the returns themselves. In contrast, the 
second class leads to long memory properties in the squared returns, which 
is more appropriate. However, an important question remains open concern- 
ing FIGARCH processes: is this kind of representation time consistent? That 
is, given that the daily returns of an asset can be modeled by a FIGARCH 
process, can we still model the monthly returns of this asset by a FIGARCH 
process? In other words, is the class of FIGARCH processes closed under time 
aggregation? 

If time-consistency is not obeyed, the comparison of the discrete-time 
model with empirical data at all time-scales simultaneously imposes strong 
additional restrictions on the model. It is thus highly desirable that a suitable 
discrete-time model be time-consistent. Note that continuous-time models are 
time-consistent by construction, justifying the emphasis on continuous-time 
stochastic processes with long memory. 

This Appendix presents useful results on a family of continuous-time sto- 
chastic processes which enjoy the property of long memory, the so-called mul- 
tifractal process, born from the generalization of the seminal works by Man- 
delbrot on the notions of self-similarity and fractality. 


21 Fractionally integrated autoregressive moving average. 
22 Fractionally integrated GARCH. 
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2.A.1 Self-similar Processes, Multiplicative Cascades 
and Multifractal Processes 


Before presenting two examples of multifractal processes with suitable prop- 
erties for the modeling of financial asset prices, it is useful to describe their 
underpinning. First, let us recall that given the filtered space (2, {Fi}i50, P), 
the stochastic process {X(t)} (with X(0) = 0) is self-similar with exponent 
H > 0 if, by definition, for all A,k,t,,...,t, > 0 


On iar S OP Ete (2.4.1) 


The most famous self-similar stochastic process is obviously the Brownian mo- 
tion whose exponent H = 1/2. It belongs to the family of self-similar processes 
with stationary Gaussian increments, namely the Fractional Brownian Mo- 
tions whose exponent H range is ]0, 1[; when 0 < H < 1/2, the autocorrelation 
of the increments is negative (antipersistence) while it is positive (persistence) 
when 1/2 < H <1. In the later case, the Fractional Brownian Motion exhibits 
long memory. 

Let us consider the law of the increments 6,X(t) = X(t)—X(t—1). Assum- 
ing the stationarity of these increments, the law of 6,X(t) is the same as the 
law of X(1) (since X(0) = 0). Thus, if X(t) is self-similar with an exponent 
H, it is easy to prove that the g-order moment of 6; X(t) and 6; X(t), denoted 
by M(q,!) and M(q, L) respectively, are related by 


qH 
M(q,l) = (+) M(q,L) . (2.A.2) 


This is called a “monofractal” process characterized by a linear dependence 
of the moment exponent ¢(q) = gH as a function of the moment order gq. 

A multifractal process is obtained by using a weaker form of self-similarity. 
Instead of the simple scaling rule 


X(At) ‘2’ AF. X(t) (2.4.3) 
induced by (2.A.1), multifractal processes enjoy the more general property 
X(at) ‘2’ K(A)- X(8) (2.A.4) 


where X and K are independent random variables. This generalized scaling 
rule induces strong restrictions on the distribution of the stochastic process. 
For instance, it is straightforward to show that for all A,k,t,,...,t, >0 


X (Ati) taw X(At2) taw law X (Atk) 
X(ti) X(t) X(t) 


(2.4.5) 


and that K(u-v) iw K'(u)- K"(v), where K’ and K” are two independent 
copies of kK. This last relation implies (provided that the expectations are 
finite) 
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B(|K(u-v)/] = BKWI)-E1K()/4 , (2.4.6) 
which immediately yields 

E[|K(A)|4] =, VA>0, (2.A.7) 


for some real-valued function ¢(-) such that ¢(0) = 0. Considering the relation 
between the g-order moments of 6,X(t) and 6,X(t), (2.A.2) generalizes as 
follows 


¢(q) 
mad=(Z) Mat), 2.4.8) 


The function ¢(q) defines the multifractal spectrum of the process. 

Processes enjoying this scaling property can be derived from so-called mul- 
tiplicative cascades [17, 190, 340, 341]. It is convenient to present multiplica- 
tive cascades with discrete scales |, = 2~"L. A multiplicative cascade for the 
increments 0X is defined by relating the local variation of the process 6), X 
at scale I, to the variation at scale L according to 


5, X(t) = (11 w) 5, X(t), (2.4.9) 


where W; are i.i.d. random positive factors. Realizations of such processes 
can be constructed using orthonormal wavelet bases [16]. If one defines the 
magnitude w(t,l) at time t and scale | as [17], 


w(t, l) = 5 In(|6eX°(t))?) (2.A.10) 


then the cascade (2.4.9) becomes a simple random walk as a function of the 
logarithm of scales, at a fixed time t: 


w(t, lng1) = w(t, ly) + 1n(W41) - (2.4.11) 


Assuming that W follows a log-normal law with parameters (j:, \7), the magni- 
tude w admits a density at scale l,, Qi, (w), which satisfies the simple equation 


Qi, (w) = (p(u, A*)*” * Qz) (w) (2.4.12) 


where * is the convolution product and y(, \7) denotes the Gaussian density 
function with mean p and variance \?. Going back to the original variable 5X, 
the previous equation provides us with the expression of the density function 
of 6X at scale I, 


P,, (x) = J Gin ™Pr(e"a)du (2.4.13) 


where 
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Gib = p(y, 4°)" = p(n, nr?) . 
Conversely, a process that satisfies (2.4.13) with a normal kernel G can 
be written as 


5X (t) ‘2’ W - 5 X(t) (2.4.14) 


where W is distributed according to a log-normal law with mean yp and vari- 
ance \?. By iterating this equation n times, one thus recovers the cascade 
(2.4.9). Therefore, the cascade picture across scales constitutes a paradigm 
of multifractal self-similar processes. The log-normal cascade model on the 
dyadic tree associated to the orthonormal wavelet representation leads to a 
magnitude correlation function given by 


C,,(7, 1) = Cov(w(t, 1), w(t +7,1)) « —M In(7/T), for l<r<T, 
(2.4.15) 


which is proportional to the logarithm of the lag 7. The parameter T is called 
the “integral time scale” and is such that C,,(7,1) is exactly 0 for 7 > T. 


2.A.2 The Multifractal Spectrum 


We have introduced in (2.A.8) the g-order moment M(q,1) of the increment 
6,X (t) as scale 1, which follows the scaling law 


M(q,l) ~ 6 , (2.A.16) 
and allows us to explore the multifractal properties of the multifractal 
processes. 

Let us define the Hélder exponent a(to) at time to as 

6X (to) r1s0 Ito , (2.4.17) 
The multifractal spectrum f(a) is the fractal (Haussdorf) dimension of the 
iso-Holder exponent sets: 

f(a) = Dim{t, a(t) =a}. (2.4.18) 
Roughly speaking, this means that, at scale 1 < T, the number of times where 
6X (t) ~ 1% is 

N(t,a) ~ IHF) (2.A.19) 


The multifractal formalism obtains that f(a) and ¢(q) are Legendre transform 
of each other: 


f(a) =1+ ming (ga — ¢(q)) , (2.A.20) 

C(q) =1+ming (ga — f(a)) . (2.A.21) 
Therefore, in the multifractal formalism, gq is nothing but the value of the 
derivative of f(a) and conversely a is the value of the derivative of ¢(q): 


g(a") = De, (2.A.22) 


~ Oa 
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2.A.3 The Multifractal Model of Asset Returns 
of Mandelbrot ez al. 


Mandelbrot et al. [341] have proposed a very simple way to obtain a multi- 
fractal process with suitable properties for the modeling of asset returns. In 
its simplest form, it is based upon the subordination of a Brownian motion 
by a multifractal process. Indeed, considering the price process { P(t) }i>0, the 
logarithm of the price 


X(t) = In P(t) — In P(0) (2.4.23) 
is assumed to be defined by 
X(t) = Bla(e)) (2.4.24) 


where B(t) denotes the standard Brownian motion assumed independent of 
the stochastic process 0(t) which is a multifractal process with continuous, 
nondecreasing paths and stationary increments. 

It is easy to check that X(t) enjoys the multifractal property and it is 
straightforward to show that its multifractal spectrum Cx is related to the 
multifractal spectrum C9 of @ by 


Cx (q) = Go (3) (2.A.25) 


In addition, as long as B(t) is a Brownian motion without drift, the stochastic 
process {X(t)}i;50 is a martingale with respect to its natural filtration pro- 
vided that E [0'/?] < oo. Besides, if E 6] is finite, the autocovariance function 
of the price return process 6;X(t) vanishes for all lag larger than |. The co- 
variance of the absolute values of the price returns (raised to the power 2q) 
satisfies 


Cov (|5,X (t)|?4, |. X(t +7)|?4) = (gq) - Cov (5,4(4)|7, [5O(E + 7)1) 
(2.4.26) 


with «(q) = (E (|B@?4])’, so that the volatility of assets returns exhibits 
long memory if the volatility of @ itself exhibits long memory, i.e., if periods 
of intense trading activity alternates with periods of weak activity. 


2.A.4 The Multifractal Random Walk (MRW) 


An alternative approach for modeling the dynamics of asset returns in terms 
of multifractal processes has been introduced by Bacry et al. [27, 28] with 
the aim of constructing a stationary process with continuous scale invariance 
inspired from the standard hierarchical models presented in Sect. 2.A.1. 
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The MRW is a stochastic volatility model which has exact multifractal prop- 
erties, is invariant under continuous dilations, and possesses stationary incre- 
ments. It is constructed so as to mimic the crucial logarithmic dependence 
(2.4.15) of the magnitude correlation function, at the basis of multifractality 
in cascade processes. 

The MRW is constructed as the continuous limit for At — 0 of the dis- 
cretized version Xa; (using a time discretization step At) defined by adding 
up t/At random variables: 


t/At 
Xai(t) = S> 6X ailk] - (2.A.27) 
k=1 


The process {6X 4;[k]}, is a noise whose variance is stochastic, 7.e., 
6X atk] = €ar[k]evoe , (2.A.28) 


where wa:[k] is the logarithm of the stochastic variance. €,; is a Gaussian 
white noise independent of w and of variance a? At.?° 

Following the cascade model, w,; is a Gaussian stationary process whose 
covariance reads 


Cov(waz[K], wa:[l]) = 7 In pail|k — Ul] , (2.4.29) 


where pz is chosen in order to mimic the correlation structure (2.4.15) ob- 
served in cascade models: 


pat{k] = { caren for |k| < T/At—1 


2.A. 
1 otherwise ( 20) 


In order for the variance of X 4:(t) to converge when At — 0, one must choose 
the mean of the process wa; such that [27] 


E (wat(k]) = —Var (wai[k]) = —\2 n(T/At) , (2.4.31) 


for which Var(X4;(t)) = 07 t. 


Multifractal Spectrum 


Since, by construction, the increments of the model are stationary, the pdf of 
X ar(t+l) — Xaz(t) does not depend on ¢ and is the same as that of X 4:(1). In 


23 Introducing an asymmetric dependence between w and the noise € allows one 
to account for the Leverage effect [170] while preserving the scale invariance 
properties of the MRW, but forbids the existence of a limit as At —> 0 [388]. 
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[27], it was proven that the moments of X(1) = X j¢_.9+ (1) can be expressed 
as 


2P(2p) 
E(X(1)2) = = f duy.. oi dup TJ plui— uj) , (2.4.32) 
1<j 


where p is defined by 


_ frye) for lt} < 7 
A(t) = { 1 otherwise ° eee) 


Using this expression in the multiple integrals in (2.A.32), a straightforward 
scaling argument leads to 


1 \ Po 2P(P— 1A? 
M(2p,l) = Koy (=) . (2.4.34) 
where 
Kop = T?0 ae (2p — yf duy.. fe duty | [ \ui aa : (2.4.35) 


<j 


Kop is nothing but the moment of order 2p of the random variable X(T) or 
equivalently of 67X(t). Expression (2.4.35) leads to ¢2, = p — 2p(p — 1)?, 
and by analytical continuation, the corresponding full ¢, spectrum is thus the 
parabola 


Gq = (¢- a(q— 2)d”)/2. (2.4.36) 


Approximate form in Terms of a Long Memory Kernel 
in the Discrete Time Approximation 


Consider the returns at scale At, defined by ra: (t) = In[p(t)/p(t — At)]. Then, 
mapping the increments 5X 4:[k] defined in (2.A.28) onto raz(t) makes the 
price p(t) a multifractal random walk in the continuous limit At — 0. The 
discrete return r,;(t) can thus be written as 


raz(t) = e(t) - oar (t) = €(t) - e?4*™ , (2.4.37) 


where ¢(t) is a standardized Gaussian white noise independent of wa:(t) and 
waz(t) is a nearly Gaussian process (exactly Gaussian for At — 0) with mean 
and covariance: 


At = 5 n(o? At) — Ca;(0) (2.A.38) 
Car(T) = Cov[war(t), war(t +7)], 


2 T 7 —3/2 
-{) In (area) fe sien aR a (2.4.39) 


if |r| > PT —e-3/7 At 
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o” At is the variance of the returns at scale At and T is the “integral” (cor- 
relation) time scale. Typical values for T and \? are respectively 1 year and 
0.02. 

The MRW model can be expressed in a more familiar form, in which the 
log-volatility wa:(t) obeys an autoregressive equation whose solution reads 
[456] 

t 


warlt) = wae + / dW(r) Ka(t—7) , (2.4.40) 


where W(t) denotes a standard Wiener process and the memory kernel K ,;(-) 
is a causal function, ensuring that the system is not anticipative. The process 
W(t) can be seen as the cumulative information flow. Thus w(t) represents 
the response of the price to incoming information up to the date t. At time 
t, the distribution of wy;(t) is Gaussian with mean fia; and variance Vat = 
Jo dt KA,(7) = 7 In (457). Its covariance, which entirely specifies the 


random process, is given by 


Ca:(r) = ie dt Ka,(t)Kap(t + |r|) - (2.4.41) 


Performing a Fourier transform, we obtain 


Raf)? = Cael f) = 2? f* | S08 ap 0 (panin(¢an)| 
(2.A.42) 
which shows that, for 7 small enough, 
Kat(t) ~ Reg ee for At<T<T. (2.4.43) 


This slow inverse square root power law decay (2.A.43) of the memory ker- 
nel in (2.4.40) ensures the long-range dependence and multifractality of the 
stochastic volatility process (2.4.37). Note that (2.4.40) for the log-volatility 
waz(t) takes a form similar to but simpler than the ARFIMA models often 
used to account for the very slow decay of the sample ACF of the log-volatility 
of assets returns [30]. 


2.B A Survey of the Properties 
of Maximum Likelihood Estimators 


This appendix summarizes the expressions of the maximum likelihood esti- 
mators derived from the four distributions (2.26-2.29). In the following, we 
consider an iid sample X;,..., X7 drawn from one of the distributions under 
consideration, namely the Pareto, the Weibull, the Exponential, the incom- 
plete Gamma and the log-Weibull distributions. 
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2.B.1 The Pareto Distribution 


According to expression (2.26), the Pareto distribution is given by 


b 
F,(x) =1- (=) , @>u (2.B.44) 
and its density is 
ae el 2.B.45 
Let us denote by 
a 
LPP (b) = In fu(X;|b 2.B.4 
7 (0) a n fu( Xib) ( 6) 


the maximum of the log-likelihood function derived under hypothesis (PD). 
b is the maximum likelihood estimator of the tail index b under the PD hy- 
pothesis. 
The maximum of the likelihood function is solution of 
jig 
+Inu— 7 nx =0, (2.B.47) 


i=l 


b 


which yields 
T ai e 
eae |e 1 “ b i 
b= om x inal , and 7 6) =m2- (145). 


Moreover, one easily shows that bis asymptotically normally distributed: 


VT(b—b) ~ N(0,b). (2.B.49) 


2.B.2 The Weibull Distribution 


The Weibull distribution is given by (2.27) and its density is 
fu(ale,d) = a (4) ge! -exp |- (=) , LSU. (2.B.50) 


The maximum of the log-likelihood function is 


T 
LSF (é,d) = max ) (In fu(Xile, d) (2.B.51) 


i=l 
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Thus, the maximum likelihood estimators (¢,d) are solution of 


t ..  (SSe ve 
oo ( «) ia ine (2.B.52) 
i T Lvi=l1 (=) = L i=1 ue 
T c 
The XxX; 
Fee =) aay, 2.B.53 
= ( “ 


Equation (2.B.52) depends on c only and must be solved numerically. Then, 
the resulting value of c can be put in (2.B.53) to get d. The maximum of the 
log-likelihood function is 


eh. (2.B.54) 


Since c > 0, the vector /T(é — c,d — d) is asymptotically normal, with a 
covariance matrix whose expression is given in Appendix 2.C. 

It should be noted that the maximum likelihood (2.B.52-2.B.53) do not 
admit a solution with positive c for all possible samples (X1,..., X7). Indeed, 
the function 


1 # Yi (*)°n 
h(c) = L = n= ’ 2.B.55 
i Pe) = 7 


which is the total derivative of LS (c, d(c)), is a decreasing function of c. This 
means, as one can expect, that the likelihood function is concave. Thus, a 
necessary and sufficient condition for (2.B.52) to admit a solution is that h(0) 
is positive. After some calculations, we find 


2(& yin X*)* — 2 yon? Xe 


h(0) = (2.B.56) 


which is positive if and only if 


1 Sse ol mee 
2(5 Dn =) aoe = on (2.B.57) 


A finite sample may not automatically obey this condition even if it has been 
generated by the SE distribution. However, the probability of occurrence of a 
sample leading to a negative maximum likelihood estimate of c tends to zero 
(under the SE Hypothesis with a positive c) as 


o( vt) ao (2.B.58) 


oO 
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i.e. exponentially with respect to T. Here, o? is the variance of the limit 


Gaussian distribution of the maximum likelihood c-estimator that can be de- 
rived explicitly. If h(0) is negative, L2” reaches its maximum at c = 0 and in 
such a case 


1 x, 1 
aLAE (0 =0)=-—In (= Soin ) = Son Xi,-1. (2.B.59) 
U 


In contrast, if the maximum likelihood estimation based on the SE assumption 
is applied to samples distributed differently from the SE, negative c-estimate 
can then be obtained with some positive probability not tending to zero with 
T — oo. If the sample is distributed according to the Pareto distribution, 
for instance, then the maximum-likelihood c-estimate converges in probability 
to a Gaussian random variable with zero mean, and thus the probability for 
negative c-estimates converges to 0.5. 


2.B.3 The Exponential Distribution 


The Exponential distribution function is given by (2.28), and its density is 


fu(a|d) = op ta exp =| 5. ES (2.B.60) 


The maximum of the log-likelihood function is reached at 


4 


T 

| 

d=-—S°X,-u, (2.B.61) 
w=1 


and is given by 


a LRP (d) = (1 +ind). (2.B.62) 


The random variable VT(d— d) is asymptotically normally distributed with 
zero mean and variance d?/T. 


2.B.4 The Incomplete Gamma Distribution 


The expression of the incomplete Gamma distribution function is given by 

(2.29) and its density is 
d° 

r(-69) 


x 


fulalb,d) = -2~ +) exp [- (5) . wea (2.B.63) 


Let us introduce the partial derivative of the logarithm of the incomplete 
Gamma function: 
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W(a,x) = < mr(a,2) = mw | di Inte et: (2.B.64) 


The maximum of the log-likelihood function is reached at the point (6, d) 
solution of 


7 ; =u ( b=), (2.B.65) 
eX, 1 u\-? 4 

= - ree ee (2.B.66) 
ted r (8, ¥) (3) 

and is equal to 
a u 
= LiS(b,d) =-Ind nr ( b, =) 
a (b+1)-( b=) +b ae 7 (=) e~ # (2.B.67) 
rd 


2.B.5 The Log-Weibull Distribution 


The Log-Weibull distribution is given by (2.34) and its density is 
Fs ee (1 ay [-»(1 =) | > (2.B.68) 
u(z|b,c) = — n= exp nj) |, %24u- .B. 
The maximum of the log-likelihood function is 


T 
LSF (6, é) = max > | In fu(Xi|b,c) « (2.B.69) 


i=l 


Thus, the maximum likelihood estimators (b, 6) are solution of 


Pe Cc 
ahi, a 
b ae (m=) (2.B.70) 
1 ADT, (n%)'n(n%) 12 (in %) 
a men u -_ In{InZ!) . 2B.71 
; ist, (in) p 21a (Ins aie 


The solution of these equations is unique and it can be shown that the vector 
VT(b — b,é—c) is asymptotically Gaussian with a covariance which can be 
deduced from the matrix (2.C.88) given in Appendix 2.C. 


2.C Asymptotic Variance—Covariance 
of Maximum Likelihood Estimators of the SE Parameters 


We consider the Stretched-Exponential (SE) parametric family with comple- 
mentary distribution function 
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F=1-F(c) =exp |- (4)'+(6)] e>u, (2.C.72) 


where c,d are unknown parameters and wu is a known lower threshold. 

Let us take a new parameterization of the SE distribution, more appro- 
priate for the derivation of asymptotic variances. It should be noted that this 
change of parameters does not affect the asymptotic variance of the form 
parameter c. In the new parameterization, the complementary distribution 
function has the form: 


F(a) = exp [-»((-) - 1)| , LSU. (2.C.73) 


Here, the parameter v involves both the unknown parameters c,d and the 
known threshold u: 


v= (=) (2.0.74) 
The log-likelihood L for sample (X1,...,X7) has the form: 


poverty aS (=) -1] i (210,95) 


i=1 j=l 
Now, we derive the Fisher matrix @: 


E|—02L] E|—6?2 .L 
-_ Ges, noe) (2.0.76) 


and find 

aL 
P= To (2.C.77) 
OF as aig aS iN", Xt Noe we [(2) 1 (2.C.78) 
Aude ~ Noa) a get 
O?L N 1 = Xj . 2 AG 

= N 
Oc? C “N 2, ( U ) ue 

N90 -5-Nv-B| (=) In? | (2.0.79) 

Cc U U 


After some calculations, we obtain: 


(2) =@] 82. nem 


where £;(v) is the integral exponential function: 


E,(v) = fe od (2.C.81) 
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Similarly we find: 


E (= ym a = [Ey (v) + Eo(v) —In(v) Ex (v)] , (2.C.82) 


U U 


where £2(v) is the partial derivative of the incomplete Gamma function: 


fa i In(t) —t os 0 QT HF oo 6) 
E,(0) = | a a= 5 [ te *e “dt 5 I'(a,2) 


a 


a=0 a=0 


(2.C.83) 
The Fisher matrix (multiplied by NV) then reads: 
4, +e" Bx(v) 
— (sat 3(1 +20” [By (v) + Bal) - noe ki 


The covariance matrix B of the ML-estimates (%, ¢) is equal to the inverse 
of the Fisher matrix. Thus, inverting the Fisher matrix @ in (2.C.84) provides 
the desired covariance matrix: 


mee [1 + 2e” Ey (v)+2e” F2(v) —In(v)e” Ei (v)] NH) [1 + e’ Fi (v)| 


_ [NH 
- —wiityll + ea) win | 
(2.0.85) 

where H(v) has the form: 
H(v) = 2e” E2(v) — 2In(v)e” Ey (v) — (e” Ey (v))? . (2.C.86) 


We also present here the covariance matrix of the limit distribution of 
ML-estimates for the SE distribution on the whole semi-axis (0, 00): 


1— F(x2) =exp(-g- 2°), «20. (2.C.87) 


After some calculations following the same steps as above, we find the co- 
variance matrix B of the limit Gaussian distribution of ML-estimates (4g, é): 


B= xt2(% [+ (y+ In(g) = 1? oa) (2.0.88) 
g-¢ly+In(g) - 1] ce 


where ¥ is the Euler number: y ~ 0.577 215... 
2.D Testing the Pareto Model versus 
the Stretched-Exponential Model 


This Appendix derives the statistic that allows one to test the SE hypothe- 
sis fi(a|c,b) versus the Pareto hypothesis fo(x|@) on a semi-infinite interval 
(u, oo), u>0. The following parameterization is used: 
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b c 
filale, b) = b u~°a! exp -? ((=) - i) Teese (2.D.89) 
c \\u 
for the Stretched-Exponential distribution and 
ue 
fo(ul@) = 6 yagi ZU (2.D.90) 


for the Pareto distribution. 


Theorem: Assuming that the sample X,,...,Xy is generated from the 
Pareto distribution (2.D.90), and taking the supremum of the log-likelihoods 
Lo and L, of the Pareto and (SE) models respectively over the domains 
(3 > 0) for Lo and (b > 0,c > 0) for Ly, then Wilks’ log-likelihood ratio 
W: 


Wn =2 bop Ly — sup wet . (2.D.91) 
b,c 


is distributed according to the x?-distribution with one degree of freedom, in 
the limit N — oo. 


Proof 
The log-likelihood Lo reads 


N N Xx, 
— Slog X; + N log(8) — BY flog a (2.D.92) 


i=l 


The supremum over 3 of Lo given by (2.D.92) is reached at 


N -1 
* 1 X; 


and is equal to 


sup Lo = —N ( 
B 


The log-likelihood Ly is 


b1 aay 
by == fog =| (c-1)— FD bon tb + 2S [ (2) -1}} : 
U 


t=1 


— log 3y) (2.D.94) 
N 


The supremum over b of L, given by (2.D.95) is reached at 


b= e (7D (4) (2.D.96) 


i=l 
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and is equal to 


N 
1 Xi * 
aD [L,=—N (: + logu — (ce - 1) 2 log re log in) : (2.D.97) 


Taking the derivative of expression (2.D.97) with respect to c, we obtain the 
maximum likelihood equation for the SE parameter c 


te Yn, ve log “ 


a ena) se 


If the sample X1,..., X is generated by the Pareto distribution (2.D.90), 
then by the strong law of large numbers, we have with probability 1 as N — 


N 
1 X; 
log — . 2.D. 
ree og = (2.D.98) 


(2.D.99) 


sS[@Y-]—8[Q)-]-s ao 


Tt eA one Oe 5. -% B 
wu) log — Bo | (=) log | = @-oF’ (2.D.101) 


i=l 


where Eo|-] denotes the expectation with respect to fo(-|3). 

Inserting these limit values into (2.D.98), the only limit solution of this 
equation is c = 0. Thus, the solution of (2.D.98) for finite N, denoted as éy, 
converges to zero with probability 1 to zero as N — +00. 

Expanding (X;/u)° in power series in the neighborhood of c = 0 gives 


Xi \° X; p X; 4 Xj 
— ] =1+c-log | — fs = alos" — + —- log? SS PaRees 3 
Uw U 2 U 6 U 


(2.D.102) 
which yields 
1 XG c as C2 fous 
7 (*) Sten + 5+ Ss, (2.D.103) 
1 Ove Xi \3 a 
W > (=) log (=) =S,+e-Se+ 33 , (2.D.104) 


where 
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N 
1 Xj 
i=l 
N 
1 Xi 
es, Slog? (=) (2.D.106) 
i=l 
N 
== 1 “ By X; 
S3 = WN > “ 10g (=) . (2.D.107) 


Putting these expansions into (2.D.96) and (2.D.98) and keeping only 
terms in c up to second order, the solutions of these equations reads 


» = So. 392 — 25153. z 195 — 2 
by = Sy" (1 z ; d ~ 2 
N 1 ( eeu 128? Cn ] > an. CN Le s= TS, 
(2.D.108) 
Inserting these solutions into (2.D.97) and (2.D.94) gives 
Cn 6S: 
sup L; = —N |1+ logu — (én — 1) $5) + log Sy + ee ae 
b,c 2 Sy 
353 — 45153 .5 
= 2.D.1 
up to the second order in Gy, and 
sup Lo = —N [1 + logu+ S; + log Si] , (2.D.110) 
B 
which obtains the explicit formula 
Wn = 2 op Ly — sup 2] : (2.D.111) 
b,c B 
, Ss 8a 7 8a\" 
~ INE - 2.D.112 
Cn (z 2 +3(2) ( ) 


Now by the law of large numbers, $; converges to 1/3, S2 converges to 
2/3? and $3 converges to 6/3° with probability 1 as N goes to infinity. Thus, 
by the continuous mapping theorem 


S3 So ise So 5 a.s 1 
6S, 2  8\Sy 26? ’ 


so that defining the variables €; = S, — B71, 2 = Sy — 2872, we can assert 
that 


(2.D.113) 


2 
Wn =G°N (26 _ er] + 0,(1). (2.D.114) 
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Now, accounting for the fact that 


ae -1 
via (8) #8. (0,(yg-sadg-2)) aie 


we can write 


VN BE = 5 (Ves) + <é, (2.D.116) 


RlNw 


where € is a Gaussian variable with zero mean and unit variance, independent 
from €,. This implies that 


Wy = @ +0,(1) , (2.D.117) 


which means that Wilks’ statistic Wy converges to a y?-random variable with 
one degree of freedom. 


3 


Notions of Copulas 


In this chapter, we introduce the notion of copulas, which describes the depen- 
dence between several random variables. These variables can be the returns 
of different assets or the value of a given asset at different times, and more 
generally, any set of economic variables. We present some examples of clas- 
sical families of copulas and provide several illustrations of the usefulness of 
copulas for actuarial,! economic, and financial applications. 

Until relatively recently, the correlation coefficient was the measure of 
reference used to quantify the amplitude of dependence between two assets. 
From the old hypothesis or belief that the marginal distribution of returns is 
Gaussian, it was natural to extend this assumption of normality to the mul- 
tivariate domain. Recall that only under the assumption of multivariate nor- 
mality? is the correlation coefficient necessary and sufficient to capture the 
full dependence structure between asset returns. The growing attacks of the 
past three decades and the now overwhelming evidence against the Gaussian 
hypothesis also cast doubts on the relevance of the correlation coefficient as 
an adequate measure of dependence. See for instance [404] for a specific test 
of multivariate normality of asset returns. Actually, it is now clear that the 
correlation coefficient is grossly insufficient to provide an accurate description 
of the dependence between two assets [64, 148, 149] and that it is necessary 
to characterize the full joint multivariate distribution of asset returns. This is 
all the more important for rare large events whose deviations from normality 
are the most extreme both in amplitude and dependence. 

Consider for simplicity the problem of characterizing the bivariate distrib- 
ution of the returns of only two assets. It is essential to realize that the bivari- 


' Actuarial science is a sister discipline of statistics. Actuaries play an important 
role in many of the financial plans that involve people, e.g., life insurance, pension 
plans, retirement benefits, car insurance, unemployment insurance, and so on. 

2 To some extent, the correlation coefficient also adequately quantifies the depen- 
dence between elliptically distributed random variables, even if it may yield spu- 
rious conclusions — especially in the far tails — as we shall see in the next chapters. 
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ate distribution embodies two qualitatively different pieces of information on 
the two assets. On the one hand, it contains the two marginal distributions; 
on the other hand, it contains information on the dependence between the 
two assets irrespective of their individual (marginal) distributions. Only the 
introduction of the copula allows one to operate a clean dissection between 
these two pieces of information. The role of the copula of two random vari- 
ables is precisely to offer a complete and unique description of the dependence 
structure existing between them, excluding all (parasiting) information on the 
marginal distribution of the random variables. 

Such an approach in terms of copulas has witnessed a recent burst of in- 

terest and of activity spurred by its practical and theoretical implications. 
From an applied view point, determining the dependence between assets is at 
the core of risk management: the dependence governs (i) the optimization of 
diversification of risks by aggregation in portfolios, (ii) the hedging strategies 
based on derivatives, and (iii) the securitization? of different risky instru- 
ments to sell them to third parties. Specifically, the advantage of the copula 
formulation is to provide a better understanding and quantification of the 
interactions between assets by determining the diverse dependence structures 
between the various sources of risk. Applications to finance include the calcu- 
lation of VaR (Value-at-Risk) and portfolio optimization [145], the calculation 
of option prices [99, 112], and credit risk [184, 186]. For various applications 
to insurance, see [110, 183, 478]. 
From a fundamental viewpoint, it is reasonable to think that the structure 
of dependence between assets reflects the underlying mechanisms at work in 
financial markets. In particular, the dependence between assets is in part the 
result of the interactions between the agents investing in the stock market’. 
Not only are investors responsible for the individual variations and fluctua- 
tions of assets but, by their asset allocation choices (buying or selling such 
or such security rather than another), they also create dependence between 
assets. It can thus be hoped that the study of the dependence between assets 
may complement the understanding of the important mechanisms at work 
in stock markets and therefore of the interactions between agents. It should 
also help in narrowing down the relevant macroscopic parameters influencing 
investors in their asset allocation. 

Before presenting copulas and their fundamental properties, we should 
stress that this body of results applies also when the structure of dependence 
is time-varying. This remark is important since there is a priori no principle 
or reason for the dependence to be constant [380, 409, 412]. One should thus 


3 The process of aggregating similar instruments, such as loans or mortgages, into 
a negotiable security. 

* Of course, the observed dependence between assets has also other inputs than just 
the action of economic agents on financial markets. The macroeconomic variables 
also play an important role, especially for assets belonging to the same economic 
sector, which are collectively sensitive to the same variations of the macroeco- 
nomic landscape. This is the stance taken by factor models described in Chap. 1. 
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study its dynamics in addition. However, such a study of the time-dynamics 
of the multivariate dependence structure between assets is extremely deli- 
cate both from an empirical and theoretical point of view. In addition, as we 
show in Chap. 6, some apparent time-varying dependence may appear as a 
spurious consequence of conditioning the measures of dependence on market 
phases with large volatility, for instance. This mechanism appears to explain 
a large part of the empirical observations on time-varying dependence, sug- 
gesting that it would be sufficient to model the time-dependent properties of 
volatility alone. We thus make the simplifying assumption that any possible 
time-dependence of the statistical properties of assets is entirely embedded in 
the evolution of the marginal distributions of their returns, while the depen- 
dence structure between assets remains invariant. 


3.1 What is Dependence? 


The notion of independence of random variables is very easy to define. From 
elementary probability theory, two random variables X and Y are independent 
if and only if, for any x and y in the supports of the distributions, 


Pr[X <a;¥ <y]=Pr[X <a]-Pr[¥ <y], (3.1) 
or equivalently 
Pr[X <a@|Y]=Pr[X <a]. (3.2) 


In other words, two random variables are independent if the knowledge of a 
piece of information about one of the random variables does not bring any 
new insight on the other one. 

The notion of dependence is much more subtle to define, or at least to 
quantify. Let us start with the concept of mutual complete dependence [290]. 
It seems natural that two real random variables X and X’ are mutually com- 
pletely dependent if the knowledge of X implies the knowledge of X’, and 
reciprocally. This statement simply means that there exists a one-to-one map- 
ping f such that: 


X'= f(X), almost everywhere , (3.3) 


which, as stressed in [270], implies the perfect predictability of one of the ran- 
dom variables from the other one. The mapping f is either strictly increasing 
or strictly decreasing. In the first case, the random variables are said to be 
comonotonic. 

In a second stage of our investigation of the concept of dependence, let us 
ask what could be the meaning of the following statement: 


The random variables X and Y exhibit the same dependence as the 
random variables X’ and Y’. 
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A possible interpretation, explored in this chapter, is that the random vari- 
ables X and X’, on the one hand, and Y and Y’, on the other hand, are 
comonotonic. In this case, all variables or functions describing the dependence 
between two (and more generally several) random variables should enjoy the 
property of invariance under an arbitrary increasing mapping. Let us assume 
that there exists a function C' describing the dependence of the random vari- 
ables X and Y and a function C’ describing the dependence of the random 
variables X’ and Y’. Writing that X and X’ (respectively Y and Y’) are 
comonotonic, 


X' =hi(X), (3.4) 
Y’=h,(Y), (3.5) 


where h, and hg are increasing functions on R (if we consider real-valued ran- 
dom variables), the property of invariance under strictly increasing mapping 
reads C = C’. 

Let us now show how to build C. Does the usual correlation coefficient 
qualify? While the correlation coefficient measures some kind of dependence, 
it is only able to account for a linear dependence.° Therefore, it does not fulfill 
the requirement for a general concept of dependence which should involve any 
nonlinear monotonic structure. Thus, we must look for something else. 

Let us consider the two random variables X and Y and their joint distri- 
bution function denoted by H: 


H(x,y) =Pr[X <as¥ <y). (3.6) 
The marginal distributions of X and Y are respectively: 

F(x) =Pr[X <a]= jim. H(a,t), (3.7) 

G(y) = Pr[Y <y] = Jim H(t,y) . (3.8) 


For simplicity, let us assume that F' and G are continuous and increasing, so 
that the usual inverses F~' and Go! exist. Then, let us define 


C(u,v) = H(F7"(u),G7"(v)),  Vu,v € [0,1]. (3.9) 


Let us now focus on the random variables X’ and Y’ given by (3.4-3.5) above. 
It is clear that their joint distribution function is 


H'(a,y) = Pr[X’ <a’ <y] =Pr[X < Ay" (a) ¥ <hQ7(y)| 
=H (hy *(2), hy *(y)) , (3.10) 


° Indeed, consider the linear regression Y = GX + where is a constant and X 
and «€ are two independently distributed centered random variables with variances 
Var(X) and Var(e) respectively. Then, the knowledge of the covariance Cov(X, Y) 
and of the variance Var(X) of X is equivalent to the knowledge of the linear de- 
pendence between X and Y: Cov(X,Y) = 6 Var(X). The correlation coefficient, 
Corr(X,¥) = Cov(X,Y)/,/Var(X)Var(Y) = [1+ Var(e)/(6?Var(X))] es 
volves in addition an information on Var(e). 
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while their marginal distributions are: 
F'(¢) = Pr [X’ <2) = F (hj (@)); (Sa) 
G'(y) = Pr[¥’ < y]=G(hy*(y)) - (3.12) 


Now, considering 

C'(u,v) = H! (F’""(u),G@’*(v)), Vu, v € [0,1], (3.13) 
elementary algebraic manipulations show that 

C(u,v) =C’(u,v), Vu,v € [0,1]. (3.14) 


It turns out that the function C defined by (3.9) is the only object obeying the 
property of invariance under strictly increasing mapping and which entirely 
captures the full dependence between X and Y. 

The following properties follow from simple calculations: 


e C(u,1)=uand C(1,v) =v, Vu,v € [0,1], 
e C(u,0) = C(0,v) =0, Vu,v € [0,1], 
e Cis 2-increasing, namely, for all u, < ug and v, < vo: 


C(ug, v2) _ C(ua2, v1) _ C(u1, v2) + C(u1, v1) >0. (3.15) 


This last property is a simple translation of the nonnegativity of probabilities, 
specifically of the following expression: 


Pr [F7"(u1) < X < F7"(u2);@ (01) < Y < G7"(w)] > 0. (3.16) 


As we shall see in the sequel, these three properties define the mathemati- 
cal object called copula, which has been introduced by A. Sklar in the late 
1950s [443] in order to describe the general dependence properties of random 
variables. 


3.2 Definition and Main Properties of Copulas 


This section provides a brief survey of the main properties of copulas, empha- 
sizing the most important definitions and theorems useful in the following. For 
exhaustive and general presentations, we refer to [248, 370] and to [74, 183] 
for introductions oriented to financial and actuarial applications. 

The definition of a copula of n random variables generalizes the intuitive 
definition (3.9) presented above for the bivariate copula. 


Definition 3.2.1 (Copula). A function C : [0,1]" — [0,1] is a n-copula 
if it enjoys the following properties : 


e Vue [0,1], CU,...,l,u,1...,l) =u, 
e Vu; € [0,1], C(u1,...,un) = 0 if at least one of the u;’s equals zero , 
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e C is grounded and n-increasing, i.e., the C-volume of every box whose 
vertices lie in [0,1]” is positive. 


It is clear from this definition that a copula is nothing but a multivariate 
distribution with support in [0,1]” and with uniform marginals. It immedi- 
ately follows that a convex sum of copulas remains a copula. The fact that 
such mathematical objects can be very useful for representing multivariate 
distributions with arbitrary marginals has been suggested in the previous in- 
troductory section and is stated more formally in the following result [443]. 


Theorem 3.2.1 (Sklar’s Theorem). Given a n-dimensional distribution 


function F with continuous® (cumulative) marginal distributions F,,..., Fn; 
there exists a unique n-copula C': [0,1]" —= [0,1] such that: 
F(t, ---42m) = C(Fi(a1), +++; Fa(tn)) - (3.17) 


Thus, the copula combines the marginals to form the multivariate distri- 
bution. This theorem provides both a parameterization of multivariate distri- 
butions and a construction scheme for copulas. Indeed, given a multivariate 
distribution F' with marginals F),..., F,, the function 


COit,<as tina FP (EY Gi) eng FG Gn) (3.18) 


is automatically an n-copula.’ This copula is the copula of the multivariate 
distribution F’. We will use this method in the sequel to derive the expressions 
of standard copulas such as the Gaussian copula or the Student’s copula. 

In addition to the copula itself, it is often very useful to consider the two 
following quantities: 


Definition 3.2.2. Given n random variables X1,...,Xn with marginal sur- 
vival distributions F\,...,F, and joint survival distribution F, the survival 
copula C’ is such that: 

CF Ga) sey Fe (Gay) HF Caps ey Ba ls (3.19) 


The dual copula C™ of the copula C of X1,...,Xn is defined by: 


O* (uy,..-,Un) =1-C(1—w,...,l—-—un), Vur,.--,Un € [0,1]. 
(3.20) 


® When this assumption fails, Sklar’s theorem still holds, but in a weaker sense: a 
representation like (3.17) still exists but is not unique anymore. 
” The quantile function, or generalized inverse, Fo? of the distribution F; can be 
defined by: 
Fo*(u) = inf{x | F(x) >u}, Vu € (0,1). 
When the distribution function F; is strictly increasing, F>' denotes the usual 
inverse of F;. In fact, any quantile function can be chosen. But, for noncontinuous 
margins, the copula (3.18) depends upon the precise quantile function which is 


selected. 
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While the survival copula is indeed a true copula, the dual copula is not. 
However, it can be simply related to the probability that (at least) one of the 
X;’s is less than or equal to x;. Indeed, one can easily check that: 


Pr lu {X; < aj}| = C* (Fi(21),..., Fa(an)) . (3.21) 


i=1 


A very powerful property shared by all copulas is their invariance under 
arbitrary increasing mapping of the random variables (this has been shown 
for the case of the bivariate copulas in the derivation ending with (3.14)): 


Theorem 3.2.2 (Invariance Theorem). Consider n continuous random 
variables X1,...,Xn with copula C. Then, if hi(X1),...,hn(Xn) are increas- 
ing on the ranges of X1,...,Xn, the random variables Yj = hi(X1),..., 
Yn = hn(Xn) have exactly the same copula C. 


Let us stress again that this result demonstrates that the full dependence 
between the n random variables is completely captured by the copula, in- 
dependent of the shape of the marginal distributions. In other words, the 
Invariance Theorem shows that the copula is an intrinsic measure of depen- 
dence between random variables. Under a monotonic change of variable from 
an old variable to a new variable, these two variables are comonotonic by 
definition. Intuitively, as explained in the previous section, it is natural that 
a measure of dependence between two random variables should be insensitive 
to the substitution of one of the variables by a comonotonic variable: if X and 
X’ are two comonotonic variables, one expects the same dependence struc- 
ture for the pair (X,Y) and for the pair (X’, Y). This is precisely the content 
of the Invariance Theorem on copulas. In contrast, a measure of dependence 
such as the correlation coefficient which is function of both the copula and 
the marginal distribution is not invariant under a monotonic change of vari- 
able. It does not constitute an intrinsic measure of dependence (we will come 
back in detail on this point in Chap. 4). The benefit of using copulas is the 
decoupling between the marginal distribution and the dependence structure, 
which justifies the separate study of marginal distributions on the one hand 
and of the dependence on the other hand. 

Let us now state several useful properties enjoyed by copulas. First, any 
copula is uniformly continuous: 


Proposition 3.2.1. Given an n-copula C, for all u1,...,Un € [0,1] and all 
Uz, +++5Un € [0,1]: 
IC (v1,.--,Un) —C (u1,.--,Un)| < Jur — wr] F--+ + lon — Un| - (3.22) 


This result is a direct consequence of the property that copulas are n- 
increasing. Indeed, restricting ourselves to the bivariate case for the simplicity 
of the exposition, the triangle inequality implies 
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IC (v1, v2) = C (u1, u2)| = IC (v1, v2) = C (u1, v2) + C (u1, v2) = C (u1, u2)| 
< |C (v1, va) — C (ur, v2)| + |C (ui, v2) — C (ur, ua), 
and by (3.15), with some of the arguments put equal to 0 or 1, we have 
IC (v1, v2) — C (ur, va)| < Jui — us| , (3.23) 
and 
IC (ui, v2) — C (ur, ua2)| < |v2 — uel , (3.24) 


which leads to the expected result. 
Besides, it follows that a copula is differentiable almost everywhere: 


Proposition 3.2.2. Let C be an n-copula. For almost all (u1,...,Un) € 
(0, 1)", the partial derivative of C with respect to u;, exists and: 
OC 
0< (t1,-++5Un) <1. (3.25) 
Ou; 


These two properties show that copulas enjoy nice regularity (or smoothness) 
conditions. In fact, the later one will turn out to be very useful for numerical 
simulations, as we shall see in Sect. 3.5. 

Due to the property that copulas are n-increasing, we can find an upper 
and a lower bound for any copula. Choosing wz = v2 = 1 in (3.15), we obtain 
that any bivariate copula satisfies 


C(u,v) >u+u-—1. (3.26) 


Since, in addition, a copula is non-negative, we obtain a lower bound for any 
bivariate copula: 


C(u,v) > max(u+vu—1,0) . (3.27) 


Similarly, choosing alternatively (uy = 0, v2 = 1) and (u2 = 1, v1, = 0), we get 
an upper bound for any bivariate copula 


C(u,v) < min(u, v) . (3.28) 


It is clear that these two bounds fulfill all the requirements of copulas, quali- 
fying the functions max (u+ vu — 1,0) and min(u, v) as genuine bivariate cop- 
ulas. These two bounds are thus the tightest possible bounds. Generalization 
to higher dimension is straightforward, so that we can state 


Proposition 3.2.3 (Fréchet-Hoeffding Upper and Lower Bounds). 
Given an n-copula C, for all uy,...,Un € [0,1]: 
max (uy +...+ Um, —n+1,0) <C (uy,...,Un) < min (ujz,...,Un) . 
(3.29) 
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Fig. 3.1. The Fréchet-Hoeffding lower (left panel) and upper (right panel) bounds 
for bivariate copulas 


These lower and upper bounds, which constitute the so-called Fréchet 
Hoeffding bounds, are represented in Fig. 3.1 for the bivariate case. The upper 
bound is itself an n-copula, while the lower one is a copula only for n = 2. 
However, this lower bound remains the best possible insofar as, for any fixed 


point (w1,...,Un) € [0, 1]”, there exists a copula C such that, at this particular 
point: 
C (u1,...,Un) = max (uy +--+ +t, —n+1,0) . (3.30) 


The Fréchet-Hoeffding upper bound represents the strongest form of depen- 
dence that several random variables can exhibit. In fact, it is nothing but the 
copula associated with comonotonicity. Similarly, when n = 2, the Fréchet- 
Hoeffding lower bound is nothing but the copula of countermonotonicity. 


3.3 A Few Copula Families 


As shown from Sklar’s theorem 3.2.1, for each multivariate distribution, one 
can easily derive a copula. Notwithstanding their formidable number, a few 
copula families play a more important role. 


3.3.1 Elliptical Copulas 


Elliptical copulas derive from multivariate elliptical distributions [252]. Here, 
we give the two most important examples, the Gaussian and Student’s copulas. 
By construction, these two copulas are close to each other in their central part, 
and become closer and closer in their tail only when the number of degrees of 
freedom of the Student’s copula increases. As a consequence, it is sometimes 
difficult to distinguish between them, even with sensitive tests. However, as we 
shall see in Chap. 4, these two copulas may have drastically different behaviors 
with respect to the dependence between extremes. 
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Multiplicative factor models, which account for most of the stylized facts 
observed on financial time series, generate distributions with elliptical copu- 
las. Multiplicative factor models contain in particular multivariate stochastic 
volatility models with a common stochastic volatility factor. They can be 
formulated as 


X=o-Y, (3.31) 


where o is a positive random variable modeling the volatility, Y is a Gaussian 
random vector, independent of o and X is the vector of assets returns. In this 
framework, the multivariate distribution of asset returns X is an elliptical 
multivariate distribution. For instance, if the inverse 1/o? of the square of the 
volatility o is a constant times a y?-distributed random variable with v degrees 
of freedom, the distribution of asset returns will be the Student distribution 
with v degrees of freedom. When the volatility follows ARCH or GARCH 
processes, then the asset returns are also elliptically distributed with fat-tailed 
marginal distributions. Such elliptical multivariate distribution ensures that 
each asset X; is asymptotically distributed according to a regularly varying 
distribution:® Pr{|X;| > 2} ~ L(a)-2~” — where L(-) denotes a slowly varying 
function — with the same exponent v for all assets. 

Elliptical copulas have the advantage of being easily synthesized numer- 
ically, which makes their use convenient for numerical simulations and for 
the study of scenarios. This results from the fact that it is easy to generate 
Gaussian or Student’s distributed random variables which, upon appropriate 
monotonic changes of variables, give the correct marginal distributions while 
conserving the copula unchanged. 


The Gaussian Copula 


The Gaussian copula is the copula derived from the multivariate Gaussian 
distribution. The Gaussian copula provides a natural setting for generaliz- 
ing Gaussian multivariate distributions into so-called meta-Gaussian distrib- 
utions. Meta-Gaussian distributions have been introduced in [283] (see [163] 
for a generalization to meta-elliptical distributions) and have been applied in 
many areas, from the analysis of experiments in high-energy particle physics 
[265] to finance [453]. These meta-Gaussian distributions have exactly the 
same dependence structure as the Gaussian distributions while differing in 
their marginal distributions which can be arbitrary. 

Let @ denote the standard Normal (cumulative) distribution and ®,, 
the n-dimensional standard Gaussian distribution with correlation matrix p. 
Then, the Gaussian n-copula with correlation matrix p is 


Cp liig sev) Ope (Or “ie ice OU) 4 (3.32) 


8 See footnote 3 page 39. 
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Fig. 3.2. Contour plot of the density (3.34) of the bivariate Gaussian copula with 
a correlation coefficient p = 0.8 (left panel) and p = —0.8 (right panel) 


whose density (see Fig. 3.2) 


Con(Ui,+++,Un) = Bu, = (3.33) 
reads 
1 1 
Colts: +-sthn) = ae exp (—5u"(u)(o —Ta)y(u)) (3.34) 


with y*(w) = (®@-1(u1),..., 1 (un)). Note that Theorem 3.2.1 and equation 
(3.18) ensure that Cpn(ui,...,Un) in (3.32) is a copula. 

The Gaussian copula is completely determined by the knowledge of the cor- 
relation matrix p. The parameters involved in the description of the Gaussian 
copula are simple to estimate, as we shall see in Chap. 5. 


Student’s Copula 


Student’s copula is derived from Student’s multivariate distribution. It pro- 
vides a natural generalization of Student’s multivariate distributions, in the 
form of meta-elliptical distributions [163]. These meta-elliptical distributions 
have exactly the same dependence structure as the Student’s distributions 
while differing in their marginal distributions which can be arbitrary. 

Given an n-dimensional Student distribution T;,,, with v degrees of free- 
dom and a shape matrix p® 


Pn ee ee ce ‘s fr i (3.35) 
meV Feet pL (8) (av)7? Jace doce (14 sets) _ 


° Note that the shape matrix p is nothing but the correlation matrix when the 
number of degrees of freedom rv is larger than 2, namely when the second moments 
of the variables X;’s exist. 
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Fig. 3.3. Contour plot of the density (3.37) of a bivariate Student ¢ copula with 
a shape parameter p = 0.8 and v = 2 degrees of freedom (left panel) or v = 10 
degrees of freedom (right panel). For small v’s, the difference between the Student 
copula and the Gaussian copula is striking on both diagonals. As v increases, this 
difference decreases on the second diagonal but remains large (for v = 10) on the 
main diagonal, as can be observed by comparing the above right with the left panel 
of Fig. 3.2 


the corresponding Student’s copula reads: 
Cn aa Vays ing tie) = 1a pa (T7*(u1), on Ee Gi)) ; (3.36) 


where 7, is the univariate Student’s distribution with v degrees of freedom. 
The density of the Student’s copula is thus 


v+n ) 


ir (gy)? Mites (1+ 9) 
Pe (14 wen) 


(3.37) 


where y’ = (T;1(u1),...,T7 1(un)). See also Fig. 3.3. 
Since Student’s distribution tends to the normal distribution when v goes 
to infinity, Student’s copula tends to the Gaussian copula as v > +00 [350]: 


sup |Cnip.(u) —Con(u)| 0, asv—>-+too. (3.38) 
u€[0,1]” 


The description of a Student copula relies on two parameters: the shape 
matrix p, as in the Gaussian case, and in addition the number of degrees 
of freedom v. An accurate estimation of the parameter v is rather difficult 
and this can have an important impact on the estimated value of the shape 
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matrix.!° As a consequence, the Student’s copula may be more difficult to 
calibrate and to use than the Gaussian copula. 


3.3.2 Archimedean Copulas 


The importance of this class of copulas lies in that it encompasses a very 
large number of copulas while enjoying a certain number of interesting prop- 
erties. In addition, as pointed out by Frees and Valdez [183], a large number 
of models developed to account for the dependence between various sources 
of risks in the theory of insurance lead to Archimedean copulas. The factor 
models constitute, however, a notable exception. While linear factor models 
play a fundamental role in the phenomenological description of interactions 
between financial assets, Archimedean copulas are not adequate to describe 
their corresponding dependence structure. In the same vein, the Gaussian and 
Student’s copulas, as well as any elliptical copula, are not Archimedean. 
An Archimedean copula is defined as follows: 


Definition 3.3.1 (Archimedean Copula). Let y be a continuous strictly 
decreasing, convex, function from [0,1] onto [0,00] and such that y(1) = 0. 
Let yl“'] be the pseudo-inverse of y : 


then the function 


C(u,v) = ol T(p(u) + p(v)) (3.40) 
is an Archimedean copula with generator y. 


The generalization to an n-copula seems straightforward: 
Cr(ur,...,Un) = GU (p(ur) +--+ + lun) - (3.41) 


However, this formulation holds— i.e., Cy, is actually an n-Archimedean cop- 
ula — if and only if yl-4) is n-monotonic: 


d* [-1] (t) 
kay 
(-1) = ie= Vk =0,1,...,n. (3.42) 
When this later relation holds for all n € N, y!! is said completely monotonic. 
In such a case, the bivariate Archimedean copula can be generalized to any 
dimension. 


10 Lindskog et al. [307] have recently introduced a robust estimation technique for 
the calibration of the shape matrix of any elliptical copula, which is described in 
Chap. 5. 
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0.87 


0.2 


Fig. 3.4. Contour plot of Clayton’s copula (left panel) and contour plot of its 
density (right panel) for parameter value 6 = 1 


The complexity of the dependence structure between n variables usu- 
ally described by a function of n variables is reduced and embedded, for 
Archimedean copulas, into the function of a single variable, the generator 
y. This transforms a multidimensional formulation into a much simpler one- 
dimensional one. 

Among the large number of copulas in the Archimedean family, the fol- 
lowing copulas can be mentioned: 


e Clayton’s copula, which plays the role of a limit copula (see (3.61)): 
Co" (u,v) = max (fu tv %—1] = .0) , @€[-1,0) (3.43) 
with generator y(t) = GG —1) 
0 d 


e Gumbel’s copula, which plays a special role in the description of depen- 
dence using extreme value theory (see next Sect. 3.3.3): 


CH (u,v) = exp ( [(—Inu)? + ( inv)?)/”) , 0€[1,00) (3.44) 


with generator y(t) = (— Int)’, 
e Frank’s copula: 


CF (u,v) = -5 In (1 . (aie | Cana 2) , OER (3.45) 


with generator y(t) = —In 


Note that the bivariate Fréchet-Hoeffding lower bound is an Archimedean 
copula, while the upper bound copula is not. For an overview of the members 
of the Archimedean family, we refer to Table 4.1 in [370]. 
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Fig. 3.5. Contour plot of Frank’s copula (left panel) and contour plot of its density 
(right panel) for parameter value 0 = 1 


A general procedure for constructing generators of the Archimedean copula 
has been proposed by Marshal and Olkin [348]. They have proved that, given 
a distribution function F defined on R* such that F(0) = 0, the inverse 
y(t) = @ 1(t) of the Laplace transform of F 


o(t) = [ e*® dF(z) (3.46) 


is the generator of an Archimedean copula. 

This suggests that frailty models [236, 480] can provide a natural mecha- 
nism for generating random variables with Archimedean copulas. Such models 
are common in actuarial science, because they offer a simple way to study the 
joint mortality of a group of individuals sharing common risk factors (see 
[103, 182, 237] among many others). In finance, they can also model the joint 
distribution of defaults of different obligators subjected to the same set of 
economic factors. 

In each case, one focuses on the continuous random variables T; represent- 
ing the survival time of the ith individual or company, 7.e., the time before 
death or default. Their individual survival distributions are defined by 


with hazard rate: 


d 
h(t) = —-— In S,(t) . (3.48) 
dt 
Conditional on a p-dimensional random vector Z representing the risk factors, 
one can use a proportional hazard model {111}, with the i*” conditional hazard 
rate given by 


hi(t|Z) = e? 7b; (t) , (3.49) 
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where the b;(t)’s are the base-line hazard rates and G is the vector of regression 
parameters (the same for all individuals). 

Defining the frailty variable V = e?'% and integrating the conditional haz- 
ard rates, one obtains the expression of the conditional survival distributions: 


t 
S,(t|\V =v) =e" FO, where fj(t) = | b;(s)ds . (3.50) 
0 


Then, assuming that V has the distribution function F' with Laplace transform 
@ (cf. (3.46)), the joint survival distribution of the T;’s is given by 


Pr (Ty S hiydeay hn > tn] = EV [Sy (ti|V) +++ Sin (tr|V)] ; 
= EV fe alee ieee) ; 


= ie eo (fiat + fn (un) dF(v), 
0 


= 9" (fiur) +--+ faltn)) - (3.51) 


Since the unconditional marginal survival function of a given T; reads 
Si(ts) = EY [S; (ta|V)] = 97" (Fa(us)) , (3.52) 


Sklar’s theorem shows that the (survival) copula of all the T;’s is: 


C(ur,---,Un) =~ * (p(u1) +--+ + Y(un)) , (3.53) 


which is Archimedean, as expected. 

As an example, let us consider Clayton’s copula. Equation (3.43) shows 
that its generator is y(t) = t~® — 1, so that ¢(t) = (1+ t)'/°, which is pre- 
cisely the Laplace transform of a Gamma distribution '(6~!,1) with para- 
meter 1/6, 0 > 0. As a consequence, considering a frailty variable V following 
a Gamma distribution with parameters (1/0,1), 6 > 0 and n conditionally 
independent random variables U;|V, with conditional law: 


Pr[U; <ulV =v =e"), us € [0,1], (3.54) 


one obtains n uniformly distributed random variable U;, whose dependence 
structure is the Clayton copula with parameter 0. 
Archimedean copulas enjoy the important property of associativity: 


C3(u,v, w) = Co(u, Co(v, w)) = Co(Co(u, v),w) , (3.55) 


where C2 and C3 respectively denote the bivariate and trivariate form of 
the copula under consideration. This property derives straightforwardly from 
(3.41). In other words, given three random variables U,V and W, the depen- 
dence between the first two random variables taken together and the third one 
alone is the same as the dependence between the first random variable taken 
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alone and the two last ones together. Therefore, if the dependence of the three 
random variables is described by an Archimedean copula, this implies a strong 
symmetry between the different variables in that they are exchangeable. As a 
consequence, when there is no reason to expect a breaking of symmetry be- 
tween the random variables, an Archimedean copula may be a good choice to 
model their dependence. Such an assumption is often used in modeling large 
credit baskets. A contrario, when the random variables play very different 
roles, namely when they are not exchangeable, Archimedean copulas do not 
provide valid models of their dependence. 

Another interesting property of Archimedean copulas is that their values 
C(u,u) on the first bisectrix verify the following inequality: 


C(u,u) <u, for all u € (0,1). (3.56) 


Reciprocally, one can demonstrate [370, Theorem 4.1.6] that any copula pos- 
sessing these two properties (associativity and C(u,u) < u) are Archimedean. 
This provides an intuitive understanding of the nature of Archimedean copu- 
las. It also allows one to understand why the Fréchet—Hoeffding upper bound 
copula is not Archimedean. Indeed, although it enjoys the associativity prop- 
erty, the Fréchet-Hoeffding upper bound is such that C(u,u) = wu for all 
u € [0,1] (note that it is the only copula with this property). 

Archimedean copulas obey an important limit theorem [260] of the type 
of the Gnedenko-Pikand-Balkema-de Haan (GPBH) theorem (see Chap. 2). 
Consider two random variables, X and Y, distributed uniformly on [0, 1], and 
whose dependence structure can be described by an Archimedean copula C. 
Then, the copula associated with the distribution of left-ordered quantiles 
tends, in most cases, to Clayton’s copula (3.43) in the limit where the prob- 
ability level of the quantiles goes to zero. To be more specific, let us denote 
by wy the generator of the copula C, assumed differentiable. Let us define the 
conditional distribution 


C(a Au,u) 
C(u, u) 


F(x) = Pr[X <a|X <u, Y<ul= , Va € [0,1], (3.57) 


where x A u means the minimum of x and u, and the conditional copula 
Cu(a,y) = Pr[X < Fy'(2),¥ < Fy’ (W)IX <u, Y <u 
C(Fr'(a),F7! 
_ (Pete), FeMw)) is 
C(u, u) 


One can first show that, provided that y is a strick generator (that is, y(0) 
is infinite such that gl] = y~!), C, is a strict Archimedean copula with 
generator: 


gult) = 9 (Fy '()) — lu), (3.59) 
= 9 (t-p* (2p(u))) — 2p(u) , (3.60) 


from which, it follows that the limiting behavior of C,,, as u goes to zero, is: 
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lim Cu(x,y) = Cy'(x,y), V(x,y) € [0,1] x [0,1], (3.61) 


provided that y is regularly varying!! at zero, with index 9 € R,. When 6 = 0, 
C,, tends to the independent copula while it tends to the Fréchet-Hoeffding 
upper bound copula when @ = oo. 

Thus, Clayton’s copula plays, in some sense, a role similar in n dimensions 
to the generalized Pareto distribution in one dimension: 


Ge(z) =1-(1+€-2) 8. (3.62) 


This result is of particular relevance in the study of multivariate statistics of 
extremes. 


3.3.3 Extreme Value Copulas 


Another family of copulas which is of common use is that of extreme value 
copulas. These copulas are derived from the dependence structure of mul- 
tivariate generalized extreme value (GEV) distributions, which provide the 
limit distributions of the component-wise maxima of n-dimensional random 
vectors, after a suitable normalization. 


Consider T iid n-dimensional random vectors X, = (Xz, ---; Xkn); 
ki =1,...,7 with distribution function F’, and their component-wise maxima 
Mr = Xkj - . 

PE 1eker kj (3 63) 


For suitably chosen norming sequences (ax,7, by,7), the limit distribution 


Mi r—b Mn.7 — bn 
lim pr( Mena Syren ta < tn] 
T—00 aT Qn,T 
= Jim FT (apr: 21 tb17,---,4n,7-2n + bn) , (3.65) 
if it exists, is given by 
C (He, (2) pene tle (Za) ’ (3.66) 


where H¢ is a GEV distribution (see Chap. 2), and C is — by definition — an 
extreme value copula. Therefore, accounting for the general representation of 
multivariate extreme value (MEV) distributions (see [107]), we can state that: 


11 See footnote 3 page 39. 
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Fig. 3.6. Contour plot of Gumbel’s copula (left panel) and of its density (right 
panel) for the parameter value 6 = 2 


Definition 3.3.2 (Extreme Value Copula). Any copula which admits the 
representation: 


Clwis---stn) = exp | v( 2 ee )]. (3.67) 


Inu,’ In un 
with 
Wi 
Vig sae. lee (=) dH(w) , (3.68) 
In, + Ty 


where H is any positive finite measure such that [,, w; dH(w) = 1 and II, 
is the (n-1)-dimensional unit simplex: 


i,= {wert Som =) ; (3.69) 


i=1 
is an extreme value copula. 


One immediately observes that V is a homogeneous function of degree —1. 
Thus, any extreme value copula satisfies [248]: 


C6 sa tie) SIC ital!” (3.70) 


for all wu € [0,1]” and all a > 0. 

It is now easy to check that Gumbel’s copula (3.44) belongs to the class of 
extreme value copula. It is depicted in Fig. 3.6: apart from a slight asymmetry 
with respect to the second bisectrix, it looks similar to a Student’s copula. 

The Fréchet-Hoeffding upper bound copula is also an extreme value copula 
since 


min (u1%,...,Un%) = min (ui,.-.,Un)° - (3.71) 
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It is interesting to notice that this copula is the only associative extreme value 
copula which is not Archimedean. Indeed, due to the relation (3.70), either 
C(u,...,u) = wu for all u € [0,1] or C(u,...,u) < wu for all wu € (0,1). 
Since the Fréchet-Hoeffding upper bound copula is the only copula such that 
C(u,...,u) = u, for all u € [0, 1] [370], we can conclude that any extreme value 
copula, which enjoys the associativity property, is an Archimedean copula. 


3.4 Universal Bounds for Functionals 
of Dependent Random Variables 


In many situations, one has to consider various non-linear operations on sets 
of dependent random variables. For instance, one has to aggregate several 
risky positions in a portfolio, or to evaluate the pay-off of a derivative on a 
basket of several underlying assets. Very often, the actual dependence of the 
random variables under consideration is not known with sufficient accuracy. It 
is therefore interesting to ask whether it would be possible to obtain (sharp) 
bounds for the distribution of aggregated losses of a portfolio or of the pay- 
ofts of a derivative constructed on a basket of assets. We will discuss in detail 
these two important examples in Sect. 3.6. For the time being, let us focus on 
the following mathematical result. 

Consider n random variables X1,...,X, with margins F),...,F, and un- 
known copula C. Let ~: R” —> R and let Y = wJ(Xj,..., Xn). The most 
general result on bounds for Pr[Y < y] !8 has been recently derived by Em- 
brechts, Héing and Juri [145]: 


Theorem 3.4.1. Let X\,...,Xn be n random variables with margins 
F,,...,F, and copula C. Let wy: R” — R be an increasing function, left 
continuous in its last argument. Provided that there exists two functions Cin 
and Csup, increasing in each of their arguments, such that C > Cing and 
C* < Coup (where the expression of the dual copula C* is given in Definition 
3.2.2), then 


Fing(y) S Pr[w(X1,---, Xn) Sy) S Foup(y) ; (3.72) 
where 
” Assuming that there exists a number u* € (0,1) such that C(u*,...,u*) = u*, 


and raising this equation to the power a, it follows that C (u*%,...,u**) = u*® 
for any positive a, by (3.70). Note that u*® spans the entire interval (0,1) when 
a ranges from zero to infinity. Thus, for all u € (0,1), C(u,...,u) = u, and since 
this equality still holds when u = (0,...,0) and u = (1,...,1), we have: 


du* € (0,1), C(u*,...,u*)=u" = Clu,...,u) =u, Vue (0, 1], 


so that either C(u,...,u) = u for all u € [0,1] or C(u,...,u) < u for all u € (0,1). 
13 Former results concerning the case where 7) is a sum of n terms or where w is an 
increasing continuous function can be found for instance in [327, 179, 486, 126]. 
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Fing (y) = 7 ave 1 Cinp (Fi(a1),---, Fr—1(@n-1), Fn (E(@1,---,€n-1,Y))) ; 
- (3.73) 
Fsup(y) = a cr Cou (Fi(a1),---,Pn—1(tn—1), Pn (E(@1,---,2n-1,Y))) ; 
(3.74) 

with 
€(@1,..-,2n-1,y) = sup {t © R; w(m,...,¢n-1,t) < y} - (3.75) 


A heuristic proof of this result can be found in Appendix 3.A. 

In this theorem, Cin and Cup can be copulas, but this is not necessary. In 
particular, since any copula is larger than the Fréchet-Hoeffding lower bound, 
in the absence of any information on the dependence between the random 
variables, one can always resort to 


Csup(ti,---,Un) = Cing (ti,-+ > Un) = max(uit+...+un—nt+1,0) . (3.76) 


This allows one to derive a universal bound for the probability that 
w(X1,...,Xy) be less than y. Obviously, when additional information on 
the dependence is available, the bound can be improved. For instance, when 
the random variables are known to be positive orthant dependent — we will 
come back in Chap. 4 on this notion — we can choose the independence (or 
product) copula! for Cinf and Cyup- 

The bound provided by Theorem 3.4.1 is point-wise the best possible. 
Indeed, as shown in [145, Theorem 3.2], there always exists a copula C for 
X1,...,X,, such that the distribution of w(X1,...,X,) reaches the bound, at 
least at one point. Therefore, on the entire set of distribution functions, it is 
not possible to improve on this bound. 

To conclude this section, let us state a straightforward bound implied 
by Theorem 3.4.1 for expectations. Denoting by Xj, 7 and Xs, two random 
variables with distribution functions Fins and F's.) respectively, and a non- 
decreasing function G, we obviously have: 


E[G (Xsup)] S E[G (Y (X1,.--,Xn))] SE[G (Xing) - (3.78) 
Similar bound exists — mutatis mutandis — for any non-increasing function. 
14 Recall that the independence (or product) copula is: 

C(u,v) =u-v, Vu,ve [0,1], 
so that: 


Pr[X <2,Y <y] = F(a, y) = C (Fx (2), Fy(y)) 
= Fx(x)- Fy(y) = Pr[X <a]-Pr[Y <y]. 
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3.5 Simulation of Dependent Data 
with a Prescribed Copula 


An important practical application of copulas consists in the simulation of 
random variables with prescribed margins and various dependence structures 
in order to perform Monte-Carlo studies [171, 244], to generate scenarios for 
stress-testing investigations or to analyze the sensitivity of portfolio alloca- 
tions to various parameters. We will come back to these various applications 
in the next section. 

Here, we present several algorithms for the simulation of random variables 
with copulas characterizing a large class of dependences. The conceptually 
simplest approach is the acceptance-rejection method [218, 243]. However, 
this method is relatively slow in large dimensions, and therefore becomes 
unreliable due to the smallness of the size of obtainable statistical samples. 
In addition, it does not lend itself well to the study of the impact of the 
dependence structure on the optimal allocation of assets, for instance. As a 
consequence, another approach is desirable. 

In fact, Sklar’s theorem shows that the generation of n random variables 
Xj,...,Xn with margins F\,..., F, and copula C' can be performed as follows: 


1. Generate n random variables u,,...,Un with uniform margins and copula 
Cc. 


2. Apply the inversion method to each u;, in order to generate each x;: 
where F,' denotes the (generalized) inverse of F;. 


Therefore, the main difficulty in generating n random variables following the 
joint distribution H (x1,...,%) =C (Fy ‘ (a1),..., Fy! (xn)) lies in the gen- 
eration of n auxiliary random variables with uniform margins and dependence 
structure given by the copula C’. We will now present two methods to simu- 
late n-dimensional random vectors: the first one is specific to elliptical copulas 
while the second one applies to a wide range of copulas. 


3.5.1 Simulation of Random Variables Characterized 
by Elliptical Copulas 


The simulation of random variables whose dependence structure is given by 
an elliptical copula is particularly simple. This is one of the many appeals of 
this family of copulas. 

By virtue of the invariance Theorem 3.2.2, the simulation of random vari- 
ables with an elliptical copula is equivalent to the problem of the simula- 
tion of elliptically distributed random variables. Therefore, simulating an n- 
dimensional vector X = (X1,...,X,,) following an n-Gaussian copula with 
correlation matrix p is particularly easy. Indeed, one has just to use the fol- 
lowing algorithm. 
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Fig. 3.7. Five thousand realizations of two random variables whose distribution 
function is given by the Gaussian copula with correlation coefficient p = 0.4 (left 
panel) and p = 0.8 (right panel) 


Algorithm 1 


1. Generate n independent standard Gaussian random variables: u = (uz, 
..+, Un) using the Box-Miiller algorithm [77], for instance, 

2. find the Cholevsky composition of p: p = A-A‘, where A is a lower- 

triangular matrix, 

set y=A-u, 

4. and finally evaluate 2; = @(y;), i=1,...,n, where ® denotes the univari- 
ate standard Gaussian distribution function. 


o 


To generate an n-dimensional random vector drawn from a more compli- 
cated elliptical copula, it is useful to recall that any centered and elliptically 
distributed random vector X admits the following stochastic representation 
[252]: 


X=R-N, (3.80) 


where N is a centered Gaussian vector with covariance matrix Y? and R is 
a positive random variable independent of N. As an example, to generate an 
n-dimensional random vector drawn from a Student copula with v degrees of 
freedom and shape matrix p, one has to follow 


Algorithm 2 


1. Generate nm independent standard Gaussian random variables: 
u = (u1,..-, Un), 

2. find the Cholevsky composition of p: p= A- A’, 

setz=A-u, 

4. generate a random variable r, independent of z = (z1,..., 2) and following 
a y?-distribution with v degrees of freedom, 

5. sett y=vv-r-!-g, 


& 
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Fig. 3.8. Five thousand realizations of two random variables whose distribution 
function is given by Student’s copula with shape coefficient p = 0.4 (left panel) and 
p = 0.8 (right panel) and v = 3 degrees of freedom 


6. and finally, evaluate 7; = Ty, (y;), i = 1,...,n, where T, denotes the uni- 
variate standard Student’s distribution function with v degrees of freedom. 


When the representation (3.80) is known explicitly, as in the example 
involving the Gaussian or Student copulas, the generation of the n-dimensional 
random vector by Algorithm 2 is straightforward. However, the law of the 
random variable FR is difficult to derive for most elliptical distributions. In 
that case, the general algorithm described in the next paragraph is much 
more useful. 


3.5.2 Simulation of Random Variables Characterized 
by Smooth Copulas 
The second general method is based upon the simple fact that: 

Pr [U1 < ut,...,Un < Un] = Pr[Un < Un|U1 = w1,.--,Un—1 = Un—1] 

x Pr[Uy < uy,...,Un-1 < Un-1], 

which gives 

Pr [U1 < ut,.-.,Un < Un] = Pr[Un < un|U1 = w,..-,Un—1 = Un—1] 

x Pr[Un—1 < Un-ilU1 = w1,...,Un—2 = Un—2] 


x Pr [U2 < ug|U4 = ut] - Pr [U; < ui] (3.81) 


by a straightforward recursion. 
Therefore, applying this reasoning to the n-copula C, and denoting by C; 
the copula of the k first variables, this yields: 


C (u1, tae Un) = Ch (tn |Ur, oes ,Un—1) eee Cy (u2|u1) : Ci (u1) ; (3.82) 
SS —’ 


=u, 
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where we define: 


Ou - ++ Oug_1 Cr (U1,- ++, U 
Cp (up| tipc tea) = 5 & (th #) (3.83) 


Uy bi tOup CR (u1, eee ,Uk-1) : 


As a consequence, in order to simulate n random variables with copula C, 
one just has to 


1. generate n uniform and independent random variables: v1,...,Un, 
2. set U1 = U1, 

-1 
3. set U2 = Co (v2|u1), 


N+1. set tin = Cn * (Unt, .. +5 Un—1)- 


This algorithm is particularly efficient when one considers the Archimedean 
copula. Genest and MacKay [198] have shown that, in such a case, it is very 
simple to generate pairs of random variables whose distribution function is 
given by the copula C with generator y. Indeed, the previous algorithm sim- 
ply leads to 


1. generate two uniform and independent random variables: vj, v2, 
2. set uy = V1, 


3. set uw =o! [p(y (20)) — (ur)] . 
Applying this simplified algorithm to simulate Frank’s copula leads to the 


following algorithm: 


1. generate two uniform and independent random variables: v;, v2, 
2. set U1 = U1, 


; ee va(e"*—1) 
3. set uz2=—% In (1 1 | ; 
The same scheme can also be used to simulate Clayton’s copula. However, 


Devroye [129] has proposed a somewhat simpler method for Clayton’s copula 
with positive parameter 0: 


1. generate two standard exponential random variables: v1, vo, 
2. generate a random variable x following the distribution I" (Ors 1), 


3. set uy = (1+ 2%)7/ and ue = (1+ 2)”. 


This approach is in fact related to Marshall and Olkin’s work [348]. Indeed, 
it is straightforward to check that, with the specification above, one has: 


Pr [U; < ui|X = 2] = e7® (ui-?-1) ; (3.84) 


as in (3.54). Figure 3.9 provides an example of the realizations obtained by 
the use of these two algorithms. 

A similar algorithm works for Gumbel’s copula with parameter 6 > 1, 
but it requires the generation of a random variable following a positive stable 
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Fig. 3.9. Five thousand realizations of two random variables whose distribution 
function is given by Clayton’s copula with parameter 0 = 1 (left panel) and by 
Frank’s copula with parameter 0 = 5 (right panel) 


law with tail index 1/6, since the inverse of the generator of such a copula 
is d(t) = eel’ t > 0. For an overview and softwares to generate random 
Lévy variables, see the Web pages of Professors J. Huston McCulloch (http: // 
economics.sbs.ohio-state.edu/jhm/jhm.html) and John P. Nolan (http: 
//academic2.american.edu/~jpnolan/stable/stable.htm1). 

To conclude on the question concerning the simulation of dependent ran- 
dom variables, the second approach is sometimes more appropriate for n- 
copulas, with n > 2, because the algorithm based upon the inversion of the 
conditional copulas can rapidly become intractable for large n. 


3.6 Application of Copulas 


This section reviews several applications of copulas to risk assessment, in par- 
ticular to tail risks in the presence of dependence [145, 179, 336], to option 
pricing [99, 417] and also to default risks [184, 302, 303]. In view of the growing 
importance of copulas in financial applications [100], an exhaustive presenta- 
tion is not realistic. We thus restrict our discussion to examples that we have 
found particularly illustrative. For many other examples, see the following 
references concerning portfolio theory [228, 338], performance measurements 
[240], insurance applications [183, 272] or decision theory [258], among many 
others. 


3.6.1 Assessing Tail Risk 


One of the most important activities in the financial as well as in the actuarial 
worlds consists in assessing the risk of uncertain aggregated positions. This 
risk is often measured by the Value-at-Risk VaR a at probability level a. VaRq 
is the lower a-quantile of the net risk position Y, as illustrated in Fig. 3.10: 
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Fig. 3.10. Value-at-Risk at probability level a for the loss Y with distribution 
function Fy. We show the case where the distribution has a gap to exemplify how 
the Value-at-Risk is defined in such a degenerate case 


Var, =inf {te R; Pr[y <t] >a}. (3.85) 


In this definition of the Value-at-Risk, we take the convention of counting 
losses as positive. In this section, we show how to bound the Value-at-Risk of 
a portfolio using copulas. 

Considering n risky investments or insurance losses Xj,..., Xn, the net 
risk of the position is: 


a 


where w; denotes the weight of position 7 in the portfolio. It is Conan to 
define X; = w; - X;, so that Y simply becomes the sum of the X;’s. If Fj is 
the distribution function of each X;, the distribution function of X; is F;(-) = 


F; (= -). Now, applying Theorem 3.4.1 with w (%, nore. & i = Xt xy 
we obtain — using slightly different notations: 


Freely) = PelY <9) <Fipaely) 5 (3.87) 


with 


Frin(y) = sup max {Sorte xi) —(n—1), o} (3.88) 
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Fmac(y) = inf nin SF x4), 7 : (3.89) 


zEA(y) 


where P denotes the left limit of F; and 


A(y) = {== (@1,...,2n) € R”; Sam : (3.90) 


Therefore, a tight bound for the Value-at-Risk of the aggregated position Y 
is: 


VaR” < VaRo(Y) < VaR” , (3.91) 
with: 

VaR™” = inf {t € R; Finaz(t) > a} , (3.92) 
and 

VaR™*? = inf {t CR; Fmin(t) >a} . (3.93) 


These two relations have a clear economic meaning: they represent respectively 
the most optimistic and pessimistic outcomes one can expect in the absence 
of any information on the actual dependence structure between the different 
sources of risk. 

A closed-form expression for Fiyjn and Finax is almost impossible to obtain 
in the general case where the marginal distributions of each of the assets 
are different. However, when all the risks can be described by distributions 
belonging to the same class, some general results have been obtained [126]. As 
an example, let us consider the case of a portfolio made of n risks (with the 
set of weights {w;,i = 1,...,n}) following shifted-Pareto distributions with 
the same tail index ( > 0: 


d; B 
Pie Sets i a cn OS (3.94) 


5 + (x — 6; 
This model provides a reasonable description of the tails of the distribution 
of returns of financial assets, such as stocks returns or FX (foreign exchange) 
rates as discussed in Chap. 2, with 6 ~ 3—4. Shifted-Pareto distributions are 
also relevant for modeling insurance claims associated with industrial [496, 
497] as well as natural disasters like, earthquakes [274, 411], floods [386] or 
fires [328]. Using (3.94), the upper and lower bounds for the Value-at-Risk are 
given by: 


ta et 1 
VaRn"” = 5° w;-6; + max{w;- di} - | (3.95) 


7 a Apap at oe. —1 , 
cay a (1 — a)1/8 
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and 

VaR = 3 Wy: (6; Xi) + A (3.96) 

° i=1 (L— a)i/8 
where 
1428 
a ut B aR 
i= [Sowa] (3.97) 
i=1 


These relations have been obtained by recursion. Indeed, as emphasized by 
Frank e¢ al. [179], (3.88-3.89) involve searching an extremum over the hyper- 
plane A(y). Such an extremum can be found recursively according to 


F® (y) = inf F(x; , 3.98 
max (y) Pe (y) min > (ai) af ( ) 
= . . (n—-1) ~ ae 
inf min { F5(2) + Faty—2),1} , (3.99) 
and equivalently 
F&) (y) = sup max{ > Fy (a) — (n-1),05, (3.100) 
Ze A(y) ae 
= sup max { F&I? (@) + Faly- 2) - 1,0} : (3.101) 
zeR 


Unfortunately, this approach is efficient only as long as Fynaz and Finin remain 
of the same class as the distributions F;’s, as occurs in the shifted-Pareto ex- 
ample (3.94). In general, the F;;’s are different and one has to rely on numerical 
procedures to derive the bounds of real portfolio risks. 

An efficient numerical algorithm has been proposed by Williamson and 
Downs [486]. Starting with T — 1 observations of the risks X1,..., Xp, one 
first evaluates the upper and lower bounds for the VaRq of a portfolio made 
of X, and X. Let q;(k/T) denote the empirical quantiles of order k/T of X;. 
Let us set —co < g;(0) < gj(1/T) and g;(1—1/T) < q@(1) < ow. It can be 
shown that convergent estimators of VaR?” and VaR”™4* are given by: 


VaRgir = max {a (5/7) + 9((k-9)/T)} (3.102) 
VaRyyr = min, {a (i/T) +21 -(G-b)/T)} (3.103) 


In practice, the convergence of VaR, /7 is very fast. Using the same kind 
of arguments as in (3.98-3.100), it appears that this method can be used 
iteratively, making possible the calculation of the bounds for (reasonably) 
large portfolios. An illustration of this method for three portfolios made of 
large capitalization US stocks is depicted in Fig. 3.11. For a portfolio of ten 
stocks and T = 1500, only a few seconds are required to obtain the Value-at- 
Risk bounds. 
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Fig. 3.11. Upper and lower bounds for the VaR of a portfolio over the period from 
25 January, 1995 to 29 December, 2000 made of two assets (Applied Materials Inc. 
and Coca Cola Co: plain lines), five assets (the two above plus E.M.C Corp MA, 
General Electric Co, General Motors Corp: dotted lines) and ten assets (the five 
above plus Hewlett Packard Co, I.B.M Corp, Intel Corp, Medtronic Inc. and Merck 
& Co Inc.: dash-dotted lines). We find practically identical results when exchanging 
these assets with others from the largest capitalization stocks. The lower negative 
bounds for portfolios of 5 and 10 assets correspond to the favourable situation where 
diversification has removed the risks of losses 


3.6.2 Asymptotic Expression of the Value-at-Risk 


In several special cases, the tail risk of a portfolio, made of assets exhibit- 
ing nontrivial dependence, can be approximately calculated by a linear or 
quadratic approximation [472] or by using an asymptotic expansion. Here, we 
follow this later approach and provide an example borrowed from [336]. 
Consider a portfolio of N assets whose dependence structure is given by 
the Gaussian copula. We will discuss the relevance and the limits of this as- 
sumption in Chap. 5. In addition, we assume that the returns of each asset 
are distributed according to a so-called modified-Weibull distribution charac- 
terized by its density 
: . |a|@—2e- (2) 


p(x) = QWV/a XE ) 


(3.104) 


or more generally 
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=> i > ; 
p(a) Ja € ifa>0 (3.105) 
X+ 
1 coe, (lel) 
p(a) = ——_— es |z| = te (2) ifa <0 F (3.106) 
2/0 = 


when it is desirable to take into account a possible asymmetry between nega- 
tive and positive values (thus leading to possible nonzero mean and skewness 
of the returns). This parameterization has the remarkable property that if the 
random variable X follows a modified-Weibull law with exponent c, then the 
variable 


Y =sen(X) V2 (2) ; (3.107) 


x 


follows a standard Gaussian law. This offers a simple visual test of the hypoth- 
esis that the returns are distributed according to the modified-Weibull distrib- 
ution: starting from the empirical returns, one transforms them by converting 
the empirical distribution into a Gaussian one. Then, plotting the transformed 
variables as a function of the raw returns should give the power law (3.107) 
if the modified-Weibull distribution is a good model. Figure 3.12 shows the 
(negative) transformed returns of the S&P’s 500 index as a function of the 
raw returns over the time interval from 03 January, 1995 to 29 December, 


10 


10° + 


Gaussian Returns 
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Fig. 3.12. Graph of the Normalized returns Y of the Standard & Poor’s 500 index 
(as explained in the text) versus its raw returns X, from 03 January, 1995 to 29 
December, 2000 for the negative tail of the distribution. The double logarithmic 
scales clearly show a straight line over an extended range of data, qualifying the 
power law relationship (3.107) 
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2000. The double logarithmic scales of Fig. 3.12 qualifies a power law with 
exponent c/2 = 0.73 over an extended range of data. 

For such a portfolio constituted of assets with returns distributed according 
to modified-Weibull distributions with the same exponent c > 1, it can be 
shown that the distribution of its returns is still given by a modified-Weibull 
law, in the asymptotic regime of large losses (counted as negative). Specifically, 
the distribution function F, of the portfolio losses is asymptotically equivalent 
to a modified-Weibull distribution function F’z, 


F(x) ~r-Fz(x), as w—>-—o0, (3.108) 


where A is a constant, with the same exponent c and with a scale factor x 
given by: 


x= (= vue , (3.109) 


where y; is the scale factor of asset 7, w; > 0 is its relative weight in the 
portfolio, o; is the solution of 


N 
S Vie wexe ox? “PS a, Vi=l,...,N, (3.110) 
k=1 


and V is the correlation matrix of the Gaussian copula. The proof of this 
result can be found in [336]. 

For two particular cases, the above equations allow us to retrieve simple 
closed-form formulas. For independent assets, one has V = Id, so that the 
solution of (3.110) is 


a= (Wixi), Vi=l,...,N (3.111) 
and thus 
e-—1 
N ~~ N=1 
c Cc 
Y= 0) Kae ; d A= |———~ , 112 
‘ (Soro eee ces (3.112) 


(see Appendix 3.B for a direct proof of this result). For comonotonic assets, 
Vj; = 1 for all 2,7 =1,...N, which leads to 


1 
e-1 


N 
a= (Some) , W=1,...,N (3.113) 
k=1 
and thus 
N 
x= Sowixi, and A=1. (3.114) 


i=l 
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This result is obvious and can be directly retrieved from the comonotonicity 
between the assets. In fact, in such a case the distribution of the portfolio is 
a modified-Weibull law, not only asymptotically but exactly over the whole 
range. 

Denoting by W(0) the initial amount of money invested in the risky port- 
folio, the asymptotic Value-at-Risk, at probability level a, can easily be com- 
puted with the formula 


VaR. ~ W(0) — jo" (1 a >) (3.115) 
~ Ea)?" W(0)-X, (3.116) 


where the function &(-) denotes the cumulative Normal distribution function 
and 


Oe (1 = <) (3.427) 


The example provided here for a portfolio made of assets whose dependence 
is described by a Gaussian copula can be easily extended to more complex 
cases. For instance, the same kind of asymptotic expansion can be performed 
for the Student’s copula. This illustrates the simplification brought by the use 
of copula for some parametric calculations of tail risks. 


3.6.3 Options on a Basket of Assets 


As suggested in [99, 417], copulas offer a useful framework for pricing mul- 
tivariate contingent claims. Indeed, they provide natural pricing kernels that 
allow one to determine the price of options defined on a basket of assets by 
simply gathering the prices of options written on each individual asset. 

Following [99], let us consider a market with two risky assets S$; and S> 
and a risk-free asset B. For simplicity — but without loss of generality — the 
risk-free interest rate is set to zero. Let us assume the existence of two digital 
options O; on S$; and Oz on S$» respectively, with maturity 7. They pay one 
monetary unit at time T if the value S;(T’) of the underlying asset at time T 
is more than K;. Their price P; is: 


P,=E® [ligmsx;}] = Pr® [Si(T) > Kil , (3.118) 


where Q denotes a risk-neutral probability measure, equivalent to the histor- 
ical probability measure P. Q is unique when the market is complete. 

Now, consider the bivariate digital option O which pays one monetary unit 
at time T if the value S(T) is larger than Ky and the value S2(T) is larger 
than Ky. The price of such an option on a basket of two assets is 


P= Ee [1451 (17) >K1,82(T)>K2}| = Pr2 [Si (T) > Ky, S(T) a 9] ‘5 
(3.119) 
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By Sklar’s Theorem (3.2.1), we can write the price of the bivariate digital 
option as a function of the price of each individual digital option: 


PSG PiePy) 5 (3.120) 


where C® is a risk-neutral (survival) copula. Just as the individual risk-neutral 
density embodies traders’ expectations on future asset prices and therefore 
represents a forward-looking indicator of market risk [44], the risk-neutral 
copula contains the expectations on future co-movements of the basket of 
assets [56]. 

Accounting for Fréchet-Hoeffding bounds, we can assert that the price of 
any bivariate digital option must satisfy 


max{P, + P, — 1,0} < P< min{P,, Py}. (3.121) 


This relation can be interpreted as a direct consequence of the no-arbitrage 
principle, as we now show. The considered market exhibits four states, denoted 
by HH, HL, LH and LL. In the first state, both $:(T) and S(T) are larger 
than Kk, and K2 respectively. In the second state, only S(T) is larger than kK, 
while in the third state, only S2(T) is larger than Ko. In the fourth state, both 
Si(T) and S2(T) are smaller than Ky, and K2 respectively. It is convenient to 
introduce the vector p whose components are the price of the bivariate digital 
option, of the risk-free asset, and of the two digital options, 


P 
1 
P=|p, (3.122) 
P2 
Let us introduce the matrix indicator defined by 
lqu(O) lat(O) lnH(O) 1r1L(O) 
lyw(1) lar(1) Ina(1) Inz(1) 
iT = 3.123 
law (O1) lwt(O1) 1rH(O1) 1n74(O1) |’ ( ) 
lyx(O2) lxt(O2) 1nH(O2) 1n1(O2) 
where 
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lyw(O1) = 1y5,(7)>K,}|HH = 1, 
laxt(O1) = 1ys,(7)>K,}|HL = 1, 
1pn7(O1) = lys.(r)>K,}\LH = 9 , 
1px(O1) = 1ygs,(r)>K,}|LL = 9 » 
law(O2) = 1lys.(7)>Ki}|\HH = 1, 
lit (Oz) = 1lys.cr)>Ki}|HL = 9 
lpa(O2) = lys.(7)>Kj\LH = 1, 
1px(O2) = 1y5,(7)>K,}|LL = O - 


The first row of IT in (3.123) corresponds to the bivariate digital option, the 
second row to the risk-free asset, the third row to the option on S; and the 
fourth row to the option on $9. The first column corresponds to state HH, 
the second column to state HL, the third column to state LH and the fourth 
column to state LL. This yields 


1000 
as 
BS | aos (3.124) 


1010 


In short, the matrix JT allows one to obtain the value of the four assets in 
each of the four states of the world. 

The absence of arbitrage opportunity amounts to the existence of a vec- 
tor p with positive components such that the vector p of prices can be written 
as follows [103, 214] 


p=Il-p. (3.125) 


Since, in the present case, the market is complete by construction, the matrix 
IT can be inverted and we have 


P 
pesca oo (3.126) 
P—P,—P2+1 
Writing that all the components of p are positive is equivalent to: 
max{P, + P2—-1,0}< P<min{P,, Pj}. (3.127) 


This retrieves (3.121) except for the fact that the Fréchet-Hoeffding bounds 
are now excluded. In fact, as recalled earlier, the Fréchet-Hoeffding upper 
and lower bounds are associated with the comonotonicity and the counter- 
monotonicity. These two situations are obviously excluded from the formu- 
lation in terms of the pricing kernel since the market cannot be considered 
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as complete in those cases. Therefore, the prices associated with the Fréchet- 
Hoeffding bounds are nothing but the static super-replication’ prices of the 
bivariate digital option. Indeed, selling for instance the bivariate digital option 
for the price P = min{P,, P2}, the trader can buy the least expensive of the 
two digital options, say O; if P, < P). Then, at maturity, she can pay one 
monetary unit to the buyer of the binary digital option with certainty since 
the binary option generates a cash-flow of one monetary unit if and only if 
the world is in the state HH for which O, also generates a cash-flow of one 
monetary unit. 

It is straightforward to extend the previous calculations to the case of 
multivariate digital options written on a larger basket of underlying assets. 
The restriction to bivariate digital options presented here is only for notational 
convenience. 

More generally, let us consider an option written on a basket of N under- 
lying assets S1,..., 5. Let the pay-off of such an option be 


Glib (Si(T),...,Sn(L))] , (3.128) 


where T still denotes the maturity. G is typically the univariate pay-off char- 
acterizing the contract. For instance, for a European call with strike K, we 
have: 


G(«) =(«-—K]t . (3.129) 


The function ~ describes how the N underlying assets S; determine the ter- 
minal cash-flow. For instance, one can consider an option on the minimum of 
the N assets 


#(S1(T),...,Sw(L)) = min{S;(T),..., Sv(T)} , (3.130) 


or on a weighed sum (a portfolio) of these assets 


N 
# (S;(T),...,Sy(T)) = ye S(T) . (3.131) 


The fair price of such a contract is, as usual, given by 
P=E®[G(v(Si(T),...,Sn(T)))] . (3.132) 
Using Theorem 3.4.1 and (3.78), we can assert that 
E2 [4 (Ssup)] SPS EPIC (Sing) (3.133) 


where Sjnf and Sy, are two random variables with distribution functions 
Fing and Fup) respectively (see Theorem. 3.4.1). 


15 To super-replicate means to hedge with certainty. 
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As an example, let us consider a rainbow call!® on the minimum of the 
N assets $1,...,S, with strike kK and maturity T [466]. For simplicity, we 
assume a zero interest rate. The value of such a contract is: 


P=E®[min{S,(T),...,Sy(T)} — K]* . (3.134) 


Denoting 7 = min{S)(T),...,5(T)}, we have 


Pr® [wy < 2] = Pr® [min{$;(T),...,Sv(T)} < a], (3.135) 
=1-Pr®@[min{S;(T),...,Sv(T)} > a], (3.136) 
=1-Pr®[S\(T) >2,...,Sn(T) > 2], (3.137) 
=1-—C2(P(z),...,Px(x)) , (3.138) 


where P,(a) = Pr® [S;(T) > a] is the price of a digital option written on the 
underlying asset S;, which pays one monetary unit if $;(T) is larger than wz. 
This immediately yields 


1 —min{P,(zx),...,Py(x)} < Pr@[w < 2] (3.139) 
and 

Pr [wb < 2] <1 —max{P\(x) +--+ + Py(z) —(N —1),0}. (3.140) 
Thus, defining Sing and Ssy,) as two random variables such that: 

Pre [Sing < 2] =1—min{P,(x),..., Pr(x)}, (3.141) 

Pre [Ssup < 2] = 1 — max{P;(a) + --- + Py(x) — (N —1),0} (3.142) 
it follows from (3.133) that 


EB? (Seen Kl SPs BP sy pK (3.143) 


The quantitative values of these two bounds are obtained after calibration 
and numerical integration. 

To obtain more accurate information on the price of options defined on 
a basket of assets, it is necessary to specify the nature of the risk-neutral 
copula. The problem comes from the fact that there exists no general rela- 
tion between the historical copula C” and the risk-neutral C®. However, in 
some special cases, one can obtain this relation. For instance, in the multivari- 
ate Black-Scholes model, both the historical and the risk-neutral copulas are 
Gaussian copulas, with the same correlation matrix. This result generalizes to 
the case where asset prices follow diffusion processes with deterministic drifts 
and volatilities [112]. 

In the more realistic case where one considers a stochastic volatility model 
(under P) like 


16 Rainbow options get their name from the fact that their underlying is two or 
more assets rather than one. 
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oe = py (t,o;(t)) dt + o;(t)dB,(t), i=1,...,N (3.144) 


for instance, where B;(t) and W;(t) denote standard Wiener processes and 
where a;(-,-) and b;(-,-) are chosen such that the o;(t)’s remain positive almost 
surely, one cannot express C” and C® explicitly. In addition, since individual 
volatilities are a non-traded assets, the market is incomplete, and the choice 
of a risk-neutral measure Q — which amounts to choosing the market prices of 
volatility risks 4; — is not unique. One has to set additional constraints in order 
to select an appropriate Q. Many methods have been developed for univariate 
stochastic volatility models, which can be extended to the multivariate case. 
Let us mention the minimal martingale measure [176, 434, 435], the minimal 
entropy measure [192, 403] or the variance-optimal measure [54, 177, 226, 292, 
384], for instance. All these examples are, in fact, particular cases of g-optimal 
measures for g = 0, 1 and 2, respectively) [125, 234], i.e. measures which are 
the closest to the objective (or historical) measure P in the sense of the qth 
moment of their relative density. Such measures minimize the functional 


E|—& (28)% if P 
Hy (P,Q) = [ats (a) F oo (3.146) 
+00, otherwise , 
for q € R \ {0,1} and 
E 1 qt1 (dQy? .] dQ if P 
H, (P,Q) = [\ ) (Se) n #3 ’ ifQ< (3.147) 
+00, otherwise , 


for q € {0,1}. The symbol “<” means absolutely continuous, i.e., the sets of 
zero measure for P are also sets of zero measure for Q. 

Such measures have the additional advantage of allowing an interpretation 
in terms of utility maximizing agents. Indeed, asset prices obtained under q- 
optimal measures represent the marginal utility indifferent prices for investors 
with HARA?” utility functions [230]. 

Using the risk-neutral probability measure Q which amounts to taking a 
vanishing market price of the volatility risk, and if in addition the rates of 
return 1; (t,0;(t)) do not depend on o;, then it can be shown that C’ = 
C® (see Appendix 3.C). In such a case, the calibration of the copula under 
historical data provides the risk-neutral copula. 

Unfortunately, when these conditions are not met, or when one considers 
more general diffusion models of the form 


dS;(t) = pi (t, S;(t)) dt + 0; Ce, S;(t)) dW;(t), 4S Ae aN. (3.148) 


it is in general impossible to obtain a relation between C” and C®. In this 
case, the risk-neutral copula can only be and has to be determined directly 


17 Hyperbolic absolute risk aversion. 
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from options prices. In practice, when one deals with contracts which are not 
actively traded, or contracts negotiated OTC,!® data may be rare, leading to 
serious restrictions for the calibration of the risk-neutral copula and showing 
the limit of the approach. 


3.6.4 Basic Modeling of Dependent Default Risks 


Default risk models are basically of two kinds. The first class contains models 
which are close to many actuarial models. They rely on the assumption that, 
conditional on a set of economic factors, the individual default probabilities 
of each obligator are independent. Such models are known as mixture models 
[248]. They include frailty models, presented page 113, as well as professional 
models like CreditRisk* [114]. It is in general difficult to obtain an analytical 
expression of their dependence structure. 

The second class of default risk models are based on Merton’s seminal work 
on firm value [358]. In particular, industry standards like Moody’s KMV [273] 
and RiskMetrics [406] are extensions of this original model. They consider 
that the default of an obligator occurs when a latent variable, which usually 
represents the firm’s asset value, goes below some level usually representing 
the value of the firm’s liabilities. In the more recent model by Li [303], the 
latent variables account for the time-to-default of an obligator and the crossing 
level represents the time horizon of interest. These approaches are equivalent 
since, once a dynamics is specified for the assets, one can derive, in principle, 
the law of the time-to-default. 

These models assume the same dependence structure for the latent vari- 
ables, characterized by a Gaussian copula. Hence, the joint probability of 
default is closely related to the Gaussian copula. Indeed, let us consider N 
obligators and let D; be the default indicator of obligator 7. D; equals one if 
obligator i has defaulted and zero otherwise. Let (Xj,..., Xj) denote the vec- 
tor of latent variables and (T),...,7n) the vector of thresholds below which 
default occurs: 


D=1— X,<T;. (3.149) 
The joint probability that obligators 71,...,7,% (k < N) default is 
Pr [D;, => 1, Di, => i eeererere Be = 1] = Pr [X;, < f ee. Ce < Teli: 


= G (Pr [Xj, < Ti, | gees ,Pr (Xi, < T;,]) ; 

= C(m,,---) Fiz) 5 (3.150) 
where C' denotes the (Gaussian) copula of the latent variables X;,,..., Xi, 
and 7;,,...,7, are the individual default probabilities of obligators 71, ..., 2%. 


18 Over-the-counter: a market for securities made up of dealers who may or may 
not be members of a formal securities exchange. The over-the-counter market is 
conducted over the telephone and is a negotiated market rather than an auction 
market such as the NYSE. 
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In the KMV methodology, the variables {X;} model the return processes 
of the assets. They are assumed multivariate Gaussian, and their correlations 
are set by a factor model representing the various underlying macroeconomic 
variables impacting the dynamics of the asset returns. Each threshold T; is 
determined by an option technique applied to the historical data of the ith 
firm. 

Credit Metrics’ approach is also based upon the assumption that the X;’s 
are multivariate Gaussian random variables. However, they do not represent 
the evolution of the asset value itself but the evolution of the rating of the 
firm. The range of each X; is divided into classes which represent the possible 
rating classes of the firm. The classes are determined so that they agree with 
historical data. This procedure allows one to fix simultaneously all the values 
of the thresholds {7;}. Again, the correlations are calibrated by assuming a 
factor model. 

In Li’s model, the latent variable X; is interpreted as the time-to-default 
of obligator 2 and the thresholds T;’s are all equal to T’, the time horizon over 
which the credit portfolio is monitored. Here, the multivariate distribution of 
the X;’s is not Gaussian anymore (since, now, the X;’s are positive random 
variables). The marginal distribution of each X; is exponential with parameter 
Aj: 


Pr [X; < @j) =1-—e *"™ , (3.151) 


while the copula remains Gaussian. Again, the correlations between the X;’s 
can be determined from a factor model. 

This recurrent use of a Gaussian factor model which is equivalent to de- 
scribing the dependence between the latent variables in terms of a Gaussian 
copula has been ratified by the recommendations of the BIS [42] concerning 
credit risk modeling. However, there are many indications suggesting that 
this Gaussian copula approach may be grossly inadequate to account for large 
credit risk (see [186] for instance), since the Gaussian copula might — by con- 
struction — underestimate the largest concomitant risks. We will come back 
in more detail on this crucial point in the next chapter (Chap. 4) where we 
will present and contrast the different available measures of dependence and 
address more precisely how to assess the dependence in the tails of the distri- 
bution. 


Appendix 


3.A Simple Proof of a Theorem on Universal Bounds 
for Functionals of Dependent Random Variables 


Here, we provide a simple heuristic proof of Theorem 3.4.1. For simplicity, we 
restrict ourselves to the bivariate case: we consider a random vector (X,Y) 
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Xx 
Fig. 3.13. The area hatched with plain lines represents the set of points (x, y) such 
that (z,y) < t. The area hatched with dashed lines represents the set of points 
(x’,y’) such that a2’ < x* and y’ < y* for some (x*, y*) satisfying w (x*, y*) =t. By 
definition, the F-measure of this area is Pr[X < a*,Y < y*] = F(a*,y*) 


with joint distribution function F’, and continuous margins F’x and Fy, respec- 
tively. In addition, we assume that the function w is continuous and increasing 
in each argument. In such a case, provided that t belongs to the range of y, 
the set of points (a,y) such that w(a,y) is less than ¢ has a typical shape 
represented by the area hatched with plain lines in Fig. 3.13. 

By definition, Pr [¢)(X,Y) < t] is the F-measure of this hatched area: 


Pr[u(X,Y) <4] = / dF (x,y) . (3.A.1) 


wh(x,y)<t 


For any couple (a*, y*) such that w (a*,y*) = t, 


Pr [b(X, Y) < #] > Pr[X <a*,Y <y*] = F(a*,y") , (3.A.2) 


since F (a*,y*) = | dF (x,y) is the F-measure of the area hatched 
with dashed lines 7 Fig. 313, which is included within the area representing 
the set of points {(z,y): (x,y) < t}. Given any copula Ci,¢ such that 
Cing(u,v) < C(u,v), V(u,v) € [0,1]? , (3.4.3) 
where C' denotes the copula of the random vector (X,Y), we can write: 
Pr [b(X,Y) < t] > Cing (Fx (2*), Fy (y*)) , (3.A.4) 
for all (x*, y*) such that  (2*, y*) = t, which finally allows us to assert that: 


Pr [w(X,Y) <> wate, Cing (Fx (2*) , Fy (y")) - (3.4.5) 
ee 
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This equation is equivalent to (3.73) under the restrictive assumptions re- 
tained in this simple proof. 

The proof of the second inequality of Theorem 3.4.1 follows the same line 
of reasoning. One has just to consider Pr [¢(X,Y) > t], which leads, mutatis 
mutandis, to undervalue the survival copula of (X,Y). 


3.B Sketch of a Proof of a Large Deviation Theorem 
for Portfolios Made of Weibull Random Variables 


Let X1,X2,...,Xn be N iid random variables with density p(-). Let us 
denote by f(-) and g(-) two positive functions such that p(-) = g(-)e"/. 
Let w 1,W2,...,wn be N real (positive) non-random coefficients, and S = 
wa Wiki. 

Let x= {x ER, ee wiz; = S}. The density of the variable S is given 
by 


Ps(S) =| dx e7 Viailf(es)—In g(ai)] (3.B.6) 
x 


We will assume the following conditions on the function f: 


. f(-) is three times continuously differentiable and four times differentiable, 
. f(x) > 0, for || large enough, 

, fO(@)  _ 0 

lOO (f 2 (a))2 ’ 

f) is asymptotically monotonic, 


limy— 4 


oF WwW NH 


(3)(g. 
. there is a constant @ > 1 such that f aie! remains bounded as x goes to 
infinity, 
6. there exists C1, C2 > 0 and some v > 0 such that C1 - 2” < g(-) < Co- 2”, 


as x goes to infinity. 


Under the assumptions stated above, the leading order expansion of Ps(S) 
for large S and finite N > 1 is obtained by a generalization of Laplace’s 
method which assumes that the set of x¥’s that maximize the integrand in 
(3.B.6) are a solution of 


fi(az) = 0(S)u; , (3.B.7) 


where o(S) is nothing but a Lagrange multiplier introduced to minimize the 
expression San fi(a;) under the constraint nat wiz; = S. This constraint 
shows that at least one x;, for instance x1, goes to infinity as S — oo. Since 
f'(z1) is an increasing function by Assumption 2, which goes to infinity as 
x1 — +co (Assumption 3), expression (3.B.7) shows that a(S) goes to infinity 
with S, as long as the weight of the asset 1 is not zero. Putting the divergence 
of o(S) with S in expression (3.B.7) for i = 2,...,N ensures that each 27 
increases when S' increases and goes to infinity when S' goes to infinity. 
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Expanding f;(#;) around 27 yields 
fai) = flat) + f"(@t)- na fo Hat fa eu “u), (3.B8) 


where the set of h; = x; — x¥ obey the condition 


N 
Yo wih = 0. (3.B.9) 


Summing (3.B.8) over i in the presence of relation (3.B.9), we obtain 


N N xi thy ti 
dred = ren yy fats fds 27a). (3.B.10) 
i= i=1 gas By 


Thus exp(— >> f(x;)) can be rewritten as follows: 


ti 
o|-> dF Li | =o |-34 >of" “ae. | du; ro) : 
(3.B.11) 
Let us now define the compact set Ac = {h ERY, Of’ (a*)?-h? < CO} 


for any given positive constant C and the set H = {he RN, ae wih; = O}. 
We can thus write 


s)= [awe tilf(e)—ma(es)] (3.B.12) 
a dh e~ Dalf(@s)—n 9(22)] 
AcnH 
of es an eT VE) gad] (3.B.13) 
an 


Let us analyze in turn the two integrals of the right-hand side of (3.B.13). 
Concerning the first integral, it can be shown that 


Sa dhe Dikiee eas Soe du f’’(u)—In g(a} +hi) 
: H. 

lin 2 

Soo 


_ =1, for some positive C. 
2m) 2 TI, 9(#?) 


(2m) 
27N W (® 
wT, fl! (@*) 
N j tiga ji 
jet Fi (ak) 


(3.B.14) 


The cumbersome proof of this assertion is found in [336]. It is based upon the 
fact that 
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1. by Assumptions 1, 3, 4 and 5 for all hh € Ag and all «; > 0 


SUD¢eEg, f® (€)| 
f" (af) 


<e;, for 2; large enough, (3.B.15) 


where Gj = c fr (aty? 4 | sy 


a 


2. for all «; > 0 and a; large enough: 


gaz + hi) 


Vhe Ac, 1—¢)" < 
Oe a) EG) 


eee, (3.B.16) 


by Assumptions 1 and 6. 
Now, for the second integral on the right-hand side of (3.B.13), we have 
to show that 
[ dh e~ = fei +hi)—9(e} +hi) (3.B.17) 
AcnH 


can be neglected. This is obvious since, by Assumption 2 and 6, the function 
f(x) — Ing(«) remains convex for x large enough, which ensures that f(x) — 
Ing(x) > C,|x| for some positive constant C, and «x large enough. Thus, 
choosing the constant C' in Ag large enough, we have 


| dh e~ a S(@i)-Inges) < ‘e dh e7 G1 Sa leet hil C (cr) 
AcnH AcnH 
(3.B.18) 


for some positive a. Thus, for S large enough, the density Ps(S) is asymptot- 
ically equal to 


In the case of the modified Weibull variables, we have 


f(z) = (“) (3.B.20) 
and 
g(a) = Tae |", (3.B.21) 


which satisfies our assumptions if and only if c > 1. In such a case, we obtain 
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e-1 
a re (3.B.22) 
ae we" 


which, after some simple algebraic manipulations, yields 


N-1 
c ae oe | z ISly¢ 
PS) ee Sigel) 3.B.23 
~|yeaa]  ayagnlsi te ore 
with 
N  .\-e 
Y= ps v=) Se (3.B.24) 
i=1 
Let us now consider N independent random variables X,, X2,...,X~ with 
modified-Weibull pdfs with the same exponent c > 1 but different scale factors 
xi. Let w1, w2,...,wn be N non-random real coefficients. Then, the variable 
Sn = w,X1, + weXe+-:-+unxXn (3.B.25) 


follows asymptotically a modified-Weibull with scale factor 


c-1 


N c 
= (Somat ja COS, 2 (3.B.26) 


i=l 


Indeed, let Yi, Y2,..., Yn be N independent and identically distributed ran- 
dom variables with modified-Weibull pdfs with same exponent c > 1 and scale 
factor vy = 1. Then, 


law 
(X1, X2,...,Xw) = (x1¥1, x2Y2,---,xwYw) , (3.B.27) 
which yields 
d 
Sn =w1X1-:Y¥1 + wex2: Yot-:::-+wnxvn: Yn - (3.B.28) 


Thus, (3.B.26) immediately follows from (3.B.24). 


3.C Relation Between the Objective 
and the Risk-Neutral Copula 


Assuming that we have a filtered probability space (2, F, (Fi)o<t<r, P) — P 
denotes the objective or historical probability measure — generated by a 2N- 


dimensional Brownian motion (B,,W,...,Bn,Wwn) with (constant) correla- 
tion matrix p, let us consider the N-dimensional stochastic volatility model: 
dS;(t 


do; (t) =a (t, 0% (t)) dt Tr b; (t, o;(t)) dW; (t), (3.C.30) 
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where S; is the price of asset i, while a;(-,-) and 0;(-,-) are chosen so that the 
volatility o;(t) of each asset remains positive almost surely. As an example, 
one can choose 


a(t, a4) = Ky (= me x) 5 and bi (t, a4) = B; 7 (3.0.31) 


This stochastic volatility model is equivalent to the Heston model [232] written 
for the squared volatility instead of the volatility itself. In the present case, 
the condition K;-m; > Be. together with «;,m,; > 0, ensures the positivity of 
o;(t), provided that o;(0) > 0. 

The solution of (3.C.29) with $;(0) = $° is: 


Si(t) = S° exp if (1 (s,0;(s)) — 5o(8)) ds + [ osisiants)| ; 
(3.C.32) 


where o;(t) is solution of (3.C.30). Denoting by Z;(t) the random variable 


[ (us (s, ai(s)) — 5ai(s)") ds + if oi(s)dB;(s) , (3.C.33) 


we can assert that the copula C” of ($;(t),...,S(t)) is the same as the 
copula of (Z,(t),...,Zy(t)), since each S$;(t) = 90. -exp [Z;(t)] is an increasing 
transform of the corresponding Z;(t). 

Assuming that the usual conditions are satisfied, Girsanov Theorem*” al- 
lows us to assert that there exists a probability measure Q, equivalent to P 
on Fr, such that 


Baoo[ E (fag ened f age) a) 


19 


w=1 

= 1 t 
>> Cz CTOML AGE a A (oote)as) - (3.C.34) 

0 

for any suitable processes (A1,...,Aw), and that 

t 
Bt aS gf haley OU 4) Mee 

B,(t) = B;(t) 4 of FAs) ds, i=1,...,N (3.C.35) 
W,(t) = +f r,(s,0;(s)) ds, i=1,...,N (3.C.36) 


19 In the theory of probability, the Girsanov Theorem specifies how stochastic 
processes change under changes in measure. The theorem is especially important 
in the theory of asset pricing as it allows one to convert the physical measure 
which describes the probability that an underlying (such as a share price or in- 
terest rate) will take a particular value into the risk-neutral measure used for 
evaluating the derivatives on the underlying. 
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are Brownian motions under Q, with correlation matrix p. Since the volatility 
is a non-traded asset, the problem of market incompleteness arises, so that 
there is not a unique risk-neutral measure such that discounted assets prices 
are martingale. 

For simplicity, let us assume that the risk-free interest rate is vanishing so 
that asset prices are directly discounted prices. Under any Q, using (3.C.35) 
and (3.C.36), (3.C.29-3.C.30) can be written 


dSi(t) _ 
S(t) = oi(t )dB Bi(t), a=1,...,N (3.C.37) 
doi(t) = is is oi(t)) — Aa (t, oi(t)) - bi (¢, on(€))] at 

bi (t, o1(t)) dWi(t), (3.C.38) 


which shows that $;(t) is a Q-martingale. The solution of (3.C.37) with 
S;(0) = S°, under Q, is: 


Se Siso ees -5 - Pes iieds | oi(s)dB()| . (3.C.39) 


where o;(t) is now the solution of (3.C.38). Denoting by Z;(t) the random 
variable 


-; | mito | Or (3.C.40) 


we can assert that the copula C® of (S1(t),...,Sy(t)) is the same as the 
copula of (A), Sots Zy(t)). 
Therefore, CP = C® if and only if the copula of (Al,.. rey AS ()) is the 


same as the copula of (Z;(t),..., Zy(t)). In the general case, the Z;(t)’s and 
Z;(t)’s are not simple increasing transforms of each other. Therefore, their 
copulas are not identical and C’ 4 C®. But in the particular case where the 
rates y; are deterministic functions — i.e., independent of o;(t) — the copula 
C® is nothing but the copula of the random variables: 


1 t t 
Z(t) = | oi(s)ds + oi(s)dB,(s), i=1,...,N, (3.C.41) 
0 0 
where o;(t) is the solution of (3.C.30), since the maps 
a> Se” (3.C.42) 


are monotonous increasing functions of their argument. If the market prices 
d;’s of volatility risks are vanishing, the vectors (Z7(t),...,Zy(t)) and 
(Z(0, awary Zy(t)) are equal in law, since (3.C.30) and (3.C.38) are then the 
same. Thus, in this case, (Z7(t),..., Zy(t)) and (Z(), B85 Zx(t)) have the 


same copula, and therefore C’ = ce. 


A 


Measures of Dependences 


In the previous chapter, we have shown how to describe with copulas the 
general dependence structure of several random variables, with the goal of 
modeling baskets of asset returns, or more generally, any multivariate financial 
risk. However, the general framework provided by copulas does not exclude 
more specific measures of dependences that can be useful to target particular 
ranges of variations of the random variables. 

This chapter presents and describes in detail the most important depen- 
dence measures. Starting with the description of the basic concept of linear de- 
pendence, through linear correlation and canonical N-correlation coefficients, 
we then focus on concordance measures and on more interesting families of 
dependence measures. We then turn to measures of extreme dependence. In 
each case, we underline their relationship with copulas. 


4.1 Linear Correlations 


4.1.1 Correlation Between Two Random Variables 


The linear correlation is probably still the most widespread measure of de- 
pendence, both in finance and insurance. Given two random variables X and 
Y, the linear correlation coefficient is defined as: 


Cov [X, Y] 
Var [X]- Var [Y] ’ 


p(X,Y) = (4.1) 


provided that the variances Var [X] and Var [Y] exist. Cov [X, Y] is the co- 
variance of X and Y. The coefficient p(X,Y) is called a linear correlation 
coefficient because its knowledge is equivalent to that of the coefficient ( of 
the linear regression Y = GX + €, where e¢ is the residual which is linearly 


uncorrelated with X. We have indeed p = 3 ae ; 
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pe a, 


U 


Fig. 4.1. Graph of the variable V = oe versus ie = cosw for w € [0,27] (left 
panel) and graph of the variable V = | - 1vejo,o) + it o -lveje,1) (right panel) 


Regularly varying random variables (power-like random variables) with a 
tail index less than two do not have finite variances; they thus do not admit a 
correlation coefficient. In addition, when the tail index belongs to the interval 
(2, 4], the correlation coefficient exists but its Pearson estimator, based on a 
sample of size T {(Xi, Y;)}7_, 


i : = 
Gey) 
pr = = = (4.2) 
1 1 
rh ccme aD aca 


where X and Y denote the sample means of X and Y respectively, performs 
rather poorly, insofar as its asymptotic distribution is not Gaussian but Lévy 
stable [356]. Therefore, a sample correlation coefficient may exhibit large de- 
viations from its true value, providing very inaccurate estimates. This is par- 
ticularly problematic for financial purposes since, as recalled in Chap. 2, the 
existence of the fourth moment for the distribution of stock returns is still a 
topic of active debate. 

Considering two independent random variables, it is well known that their 
correlation coefficient equals zero. However, the converse does not hold. In- 
deed, given a random variable w uniformly distributed in [0,27], let us define 
the couple of random variables: 


(U,V) = (cosw,sinw) . (4.3) 
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It is easy to check that p(U,V) = 0, even though the two random variables 
are not independent, as shown in the left panel of Fig. 4.1 which plots the 
variable V as a function of U. 

More striking is the case where the knowledge of one of the variables com- 
pletely determines the other one. As an example, consider a random variable 
U, uniformly distributed on [0,1] and the random variable V defined by: 


v=2 U € (0, 6), 
ee U € (0,1), Go 


for some 0 € [0, 1] (see right panel of Fig. 4.1). One can easily show that V is 
also uniformly distributed on [0,1] and that 


o(U,V) = 20-1, (4.5) 


so that U and V are uncorrelated for 9 = 1/2 while V remains perfectly 
predictable from U. 
When two random variables, X and Y, are linearly dependent: 


Y=at+p-X, (4.6) 


the correlation coefficient p(X, Y) equals +1, depending on whether ( is pos- 

itive or negative (in the previous example, this corresponds to @ = 1 or 0, 

respectively). Here, the converse holds. This derives from the representation: 
E[(¥ -(@+8-X))] 


oe 4, 
ee ae Varl¥] (4.7) 


where E| |] denotes the expectation with respect to the joint distribution of 
X and Y. p(X,Y)? is called the coefficient of determination and gives the 
proportion of the variance of one variable (Y) that is predictable from the 
other variable (X). 

By Cauchy-Schwartz inequality, (4.1) allows one to show that p € [—1, 1]. 
But, given two random variables X and Y with fixed marginal distribution 
functions F'x and Fy, it is not always possible for the correlation coefficient 
to reach the bounds +1. Indeed, Chap. 3 has shown that any bivariate distri- 
bution function F' is bracketed by the Fréchet-Hoeffding bounds: 


max {Fx («) + Fy(y) — 1,0} < F(e,y) <min{Fx(2),Fy(y)}. (4.8) 


Therefore, applying Hoeffding identity [130] 


pxY)= ff Rey) - Fea) FW) de dy (4.9) 


one can now conclude that, given Fx and Fy, the correlation coefficient p 
lies between Pmin and Pmax, Where Pmin is attained when X and Y are coun- 
termonotonic random variables while pmax is attained when X and Y are 
comonotonic random variables. 
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As an illustration, let us consider the following example from Embrechts 
et al. [149]. Given two random variables with log-normal marginal distribu- 
tions: X ~ log N(0,1) and Y ~ log N(0,c), the upper and lower bounds for 
p(X,Y) are given by 


Pmin = P (e7, e°7) and Pmax = P (e7, aaa ‘ (4.10) 


where Z is a standard Gaussian random variable. A straightforward calcula- 
tion, based upon the fact that 


Ele*4 |e" , (4.11) 
gives 
-o _4| o_y 
Pmin = and Pmax — 7 (4.12) 


vle- 2) (e? = 1) fle-D (eo? =) 


Figure 4.2 represents these two bounds as a function of a. As ao becomes 
of the order of or larger than 3, Pmin becomes extremely close to zero, so 
that in this case an (almost) vanishing correlation coefficient corresponds to 
a countermonotonic relation between the two random variables. For o larger 
than 4, both the lower and the upper bounds can hardly be distinguished 
from zero. Thus, a very small value of the correlation coefficient cannot (must 
not) be always considered as the signature of a weak dependence between two 
random variables. 

The correlation coefficient is invariant under an increasing affine change 
of variable of the form 


Fig. 4.2. Graph of pmin and pmax given by (4.12) versus o for two random variables 
with log-normal marginal distributions: log V(0,1) and log N(0,c) 


4.1 Linear Correlations 151 


X'=a-X+b, a>0, (4.13) 
Y=s¢6-Y +d, -6>0, (4.14) 


since p(X’, Y’) = p (X,Y). However, this property does not generalize to any 
(nonlinear) increasing transformation. As a consequence, the correlation coef- 
ficient does not give access to the dependence between two random variables 
in the sense of Chap. 3. This lack of invariance with respect to nonlinear 
changes of variables is due to the fact that the correlation coefficient aggre- 
gates information on both the marginal behavior of each random variable and 
on their true dependence structure given by the copula. 


4.1.2 Local Correlation 


Instead of focusing on the overall correlation, one can look at the local linear 
dependence between two random variables. This idea, introduced by Doksum 
et al. [58, 134], enables one to probe the changes of the correlation strength as 
a function of the value of the realizations of the random variables. It allows, for 
instance, to address the question of whether the correlation remains constant 
or vary when the realizations of the random variables are typical or not. This 
is particularly useful when dealing with contagions of crises (see Chap. 6) 
or when investigating whether flight-to-quality actually occurs between stock 
and bond markets, for instance. 

The definition of the local correlation coefficient is quite natural. It starts 
from the remark that, in a linear framework, if the two random variables X 
and Y are related by 


Y=a+PX+e, (4.15) 


where € is independent from (or at least uncorrelated with) X, the correlation 
coefficient reads 


B-ox 


| a oe ee 
JB -0% +02 


where o% and o? denote respectively the variance of X and of the error term 
€. 


(4.16) 


Let us now assume that the more general relation 
Y = f(X)+90(X)-e (4.17) 


holds between X and Y, with 0, = 1 and f differentiable. In the neighborhood 
of X = 20, one can linearize the relation above as follows: 


Y =[f (to) — 2o- f’ (o)] + f'(t0)X + (20) € (4.18) 


and, by analogy with (4.16), define the local linear correlation coefficient by 
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Ti tee (x0) ere ra aene 


It is straightforward to check that the local correlation coefficient reduces to 
the usual linear correlation coefficient when f is an affine mapping and o(x) 
remains constant. In addition, the local correlation coefficient p(a) fulfills the 
same main properties as the linear correlation coefficient p: 


p(x) € [-1, ]], 
pla 


) is invariant under (increasing) linear mappings in both X and Y, 
: p(x) = 0 for all x if X and Y are independent. 


Beside and in constrast with the linear correlation coefficient, the local corre- 
lation coefficient equals +1 only if o(a) is zero (the sign depends on that of 
the derivative of f), so that Y = f(X). Thus, the local correlation coefficient 
avoids the drawback of the linear correlation coefficient that a vanishing value 
can be found even when X and Y are deterministically related to each other. 


4.1.3 Generalized Correlations Between N > 2 Random Variables 


The (overall) correlation coefficient p is a linear measure of dependence be- 
tween two random variables. We now present a natural generalization to N 
random variables, whose exposition borrows from [324]. 

Let us denote by X(t) a random vector of N components, for instance the 
vector of returns of N assets in a portfolio. The mean values of the components 
of X(t) are first estimated and then subtracted to each vector X(t) for t = 
1,...,£, where L denotes the sample size, equal for instance to the chosen 
length of the time interval used for the estimations. For ease of notation, we 
keep X(t) to represent the now centered vectors. The sample estimate of the 
covariance matrix of these N random variables over some interval of length L 
is 


L 
Sx = De X(t) X(t)", (4.20) 


where 7 denotes the transpose. 

Let us now divide the N components of the vectors X(t) into two parts: a 
scalar X;(t) constituted of one of the components and an (N — Wy ensional 
column vector €;(t) = [X1(t),-.., Xi_1(t), Xi41(t),---, Xw(t)]? made of the 
other components. By multiplying (scalar product) ‘each vector €; by some 
still unknown vector @, we obtain a set of scalar values ¢; = $7 - €;. Let us 
now search for the vector @ which makes the square of the correlation coeffi- 
cient between the two random variables X; and ¢; maximum. This procedure 
constitutes an example of the implementation of the classical solution devel- 
oped by Hotelling [235, 398] on canonical correlations: the vector @ is defined 
as the eigenvector corresponding to the maximal eigenvalue (which is equal 
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to the maximal correlation coefficient between the two random variables X; 
and ¢;) of the following matrix of size (N — 1) x (N — 1): 


Sis Seix, Sxix, Sx, (4.21) 
where 


Sx; x; = Cov(X;, Xi) ’ OXb = S? x, as Cov(Xi, €7) ) Seve; = Cov(&:, &:) : 
(4.22) 


The matrices in formulas (4.21) and (4.22) are submatrices of the general 
N x N covariance matrix Sxx = Cov(X,X7) (whose estimation is given 
in (4.20)). Thus, replacing the matrix Sx x (and its submatrices) in (4.21) 
and (4.22) by its sample estimate (4.20) allows one to compute the vector @ 
and the set of scalar values ¢; for i = 1,...,N. One can call the maximum 
eigenvalue of the matrix (4.21) the “canonical coefficient of N-correlation” 
between the random variable X; and the other N —1 variables, which captures 
the common factors between X; and all the other N —1 variables. Performing 
similar operations with all other components of the vector X, one thus obtains 
a N-dimensional vector of canonical coefficients of N-correlation equal to the 
largest eigenvalues of the matrices (4.21) fori =1,...,N. For N = 2, the (N— 
1)-dimensional matrix (4.21) reduces to the square of the standard correlation 
coefficient between the N = 2 variables. 

A slightly different but equivalent formulation is as follows. Consider the 
regression of a random variable X; on the (N — 1)-dimensional random vec- 
tor €;(t) = [Xi(t),..., Xi_-1(t), Xiai(t),.--, Xn (t)]", 7e., the evaluation of a 
vector @ of regression coefficients in the linear formula: 


Xi =) 0 6jX; + G =O E+E, (4.23) 
j#t 


where ¢; is a regression residual. If the vector @ is defined by the least-squares 
method of minimizing 


(¢7 -& — X,)° (4.24) 


Me: 


t=1 


with respect to @, then its estimate is easily obtained as 


b= See Sex, : (4.25) 


. _T 
Let €; = @ - &; denote the contribution to the regression (4.23) for this 
estimate (4.25). Since 


Cov(Xi,€;) = Cov(Xi, See, Sex, €) =Sxie- Sze, Sex, (426) 
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it follows that the correlation coefficient between @ given by (4.25) and X; 
is equal to the scalar S'x,¢,- Sez, - Se,x; Sx x, which is nothing but the 
maximum eigenvalue of the matrix (4.21) [398]. This shows that the canonical 
coefficient of N-correlation: 


pe = Sxie.- Sah, Sax, - Szly, (4.27) 


can be determined from the solution of the regression problem (4.23, 4.24). 
This correspondence between the two formulations is rooted in the equivalence 
between linear correlation and the coefficient of linear regression, as pointed 
out above. 

Again, this canonical coefficient of N-correlation is, by construction, in- 
variant under linear transformations of each X; individually. However, it is 
not left unchanged under nonlinear monotonic transformations. It is therefore 
necessary to look for other measures of dependence which are only functions 
of the copula. The concordance measures described below enjoy this property. 


4.2 Concordance Measures 


4.2.1 Kendall’s Tau 


A fundamental question for financial risk management is the following: 
“Do the prices of two (or more) assets tend to rise or fall together?” 


If the answer is affirmative, the diversification of risks will probably be dif- 
ficult, since diversification is based upon the fact that the fall of an asset is 
statistically balanced by the rise of another one. A natural way to quantify 
the propensity of assets to move together is to compare the probability that 
they rise (or fall) together with the probability that one of the two assets rises 
(respectively falls) while the other one falls (respectively rises). This can be 
translated mathematically as follows. Starting with two independent realiza- 
tions (X,,Y,) and (X2, Y2) of the same pair of random variables (X,Y), let 
us consider the quantity 


7 =Pr[(X1 — X2)-(% — Y2) > 0] — Pr [(X1 — X2)- (¥1 — Yo) < 0] F 
(4.28) 


The left-most term in the r.h.s. (right-hand side) gives the probability of con- 
cordance, t.e., the probability that X and Y move together upward or down- 
ward. In contrast, the right-most term in the r.h.s. represents the probability 
of discordance, 7.e., the probability that the two random variables move in 
opposite directions. 

The expression (4.28) defines the population version of the so-called 
Kendall’s 7. This quantity is invariant under increasing transformation of the 
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marginal distributions. Indeed, given any increasing mapping Gx and Gy, we 
have 


X, > Xo —= Gx (X1) >Gyx (X2) 5 (4.29) 
As a consequence, Kendall’s + depends only on the copula of (X,Y). For 


continuous random variables, expression (4.28) can be transformed into 
7 = 2Pr((X1 — X)- (4% —¥o) > 0-1, (4.31) 


which yields the following expression in terms of a functional of the copula C’ 
of the two random variables: 


HOZa i. / OGna de Cie as (4.32) 


From this equation, one easily checks that Kendall’s 7 varies between —1 
and +1. The lower bound is reached if and only if the variables (X,Y) are 
countermonotonic, while the upper bound is attained if and only if (X,Y) are 
comonotonic. In addition, 7 equals zero for independent random variables. 
However, as for the (linear) correlation coefficient, 7 may vanish even for 
non-independent random variables. 

In spite of its attractive structure, (4.32) is not always very useful for calcu- 
lations and one often has to resort to numerical integration (by use of quadra- 
ture, for instance). However, some more tractable expressions have been found 
for particular families of copulas. 


Archimedean Copulas 


Genest and McKay [198] have shown that, for generators y which are strictly 
decreasing functions from [0,1] onto [0, co] with y(1) = 0, Kendall’s 7 of the 
Archimedean copula 


C(u,v) = 9" (v(u) + ¥(r)) (4.33) 
is given by 
r=14+4 | a4 dt . (4.34) 


This expression relies on the general fact that (4.32) can be rewritten as 
T=4-E[C(U,V)]|-1, (4.35) 


where U and V are uniform random variables with joint distribution function 
C. Now, in the particular case of an Archimedean copula, one can show that 
[370] 


Pr[C(U,V) < #4] =t— 
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y(t) 


= 


Ss 


yl (tt) ’ 


which immediately yields the results given by (4.34). Table 4.1 provides closed 
form expressions for Kendall’s 7’s of Clayton’s copula, Gumbel’s copula and 
Frank’s copula, which are shown in Fig. 4.3 as a function of their corresponding 
form parameters 0. 


Table 4.1. Expression of Kendall’s 7 for three Archimedean copulas. D; denotes 
the Debye function D1(a) = 4 fY dt a=y 


Copula p(t) 


Kendall’s + 


Range 


1 /,-6 
Clayton 5 (et -1) 


Gumbel (— Int)? 


6 € [-1, ov] 


6 € [1, o0] 


Frank In 


-1 


-10 0 


20 30 40 50 


Fig. 4.3. Graph of Kendall’s 7’s as a function of the form parameter @ defined 
in Table 4.1, for Clayton’s copula (dotted line), Gumbel’s copula (dashed line) and 
Frank’s copula (plain line). Kendall’s 7 for Frank’s copula is symmetric with respect 
to the origin 
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Elliptical Copulas 


This particularly useful family of copulas also allows for tractable calculation 
of Kendall’s 7. Generalizing the result originally obtained by Stieltjes [115] for 
the Gaussian distribution, Lindskog et al. [307] have shown that the relation 


2 
T = —arcsinp (4.37) 
7 


holds for any pair of random variables whose dependence structure is given 
by an elliptical copula. The parameter p denotes the shape coefficient (or 
correlation coefficient, when it exists) of the elliptical distribution naturally 
associated with the considered elliptical copula. 

This result is particularly interesting because it provides a robust esti- 
mation method for the shape parameter p. Of course, when the elliptical 
distribution associated with the elliptical copula admits a second moment, 
the correlation coefficient exists and p can be estimated from Pearson’s coeffi- 
cient (4.2). However, when the elliptical distribution does not admit a second 
moment, this approach fails. In this case, Kendall’s 7 has the advantage of 
always existing and of being easily estimated. In fact, its superiority is even 
greater, as demonstrated by Fig. 4.4 which shows that estimates of 7 yield 
more robust estimates of p via (4.37). This is especially true when the tails 
of the marginals associated with the elliptical distributions are heavy. In the 
example depicted in Fig. 4.4, we have considered two Student’s distributions 
with three and ten degrees of freedom respectively. While the estimates of 
p provided by Kendall’s 7 (dashed curve) remain approximately equally effi- 
cient in both cases, the efficiency of the estimates of p provided by Pearson’s 
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Fig. 4.4. Probability density function of the correlation coefficient p estimated 
from synthetic realizations generated with a student distribution with three degrees 
of freedom (left panel) and ten degrees of freedom (right panel) both with a true 
value of p = 0.6 for a sample size equal to 100. The continuous curve represents the 
pdf obtained from Pearson’s estimator while the dashed curve gives the pdf obtained 
when estimating Kendall’s 7 and applying (4.37) 
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coefficient (continuous curve) drops dramatically as the number of degrees 
of freedom of the Student’s distributions decreases. This phenomenon can be 
ascribed to the fact that, with three degrees of freedom, the correlation co- 
efficient still exists but the asymptotic distribution of Pearson’s coefficient is 
not Gaussian but has a heavy tail, as recalled in Sect. 4.1. 


4.2.2 Measures of Similarity Between Two Copulas 


Consider two copulas C1, C2 and the copula C = w- Cy + (1— w) - C2, with 
€ [0,1]. Chapter 3 has recalled that the convex sum of several copulas 
remains a copula. Kendall’s 7 of copula C’ can be written as 


To = w*-T¢, + 2w(1 — w)-Q(C1, C2) + (1 —w)? + To, , (4.38) 
with 
Q (C1, C2) _ f Cy (u, v) dC2(u, v) -1 ; (4.39) 
[0,1]? 
= 1/ Co(u,v) dCi(u,v) -—1. (4.40) 
[0,1]? 


To provide an intuitive interpretation of Q (C1, C2), let us consider two copulas 
C; and C2 with identical Kendall’s 7: tc, = Tc, = T. This means that, 
through the prism of Kendall’s 7, these two copulas C; and C2 have the same 
dependence. Kendall’s 7 of the copula C' formed by their convex sum will also 
be equal to 7, for all values of w, if and only if Q(C1,C2) = 7, that is, if 
expression (4.40) is equal to expression (4.32) obtained for either C, or Co. 
The difference 


4 / ee Gia Oni, uC) (4.41) 


between these two expressions (4.40) and (4.32) therefore allows one to define 
the notion of proximity between two copulas. 

Since any copula is bounded by the Fréchet-Hoeffding upper and lower 
bounds, we have 


iy. - max(u+ vu — 1,0) — Co(u,v)] dC2(u, v) 
< es 4 [Ci (u, v) — Co(u, v)] dC2(u, v) (4.42) 


< / J) lninteon) ~ Cola. dC (i, 8) , 


which can be rewritten as 
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[ow 1-1) du — 74? 


3 5 el [Cr(u,v) — Co(u,v)} dC2(u, v) 


4 


The left-most term is always negative while the right-most one is positive. In 
the case where the left-most term is the opposite of the right-most one, 


ff cotunpu- 4 = ~| fea Cilu,1 =a -1s* $0, (4.44) 


ai OG, aS (4.43) 
0 


one can renormalize expression (4.43) to obtain 


< J fio [Ci (u,v) — Co(u,v)] dCo(u, v) 


fe Co(u, u)du — 7+ yee) 


Choosing a fixed copula C2 as a reference, this provides a new dependence 
measure, allowing to assess the similarity between any copula C and the 
reference copula Cy. Two particular choices of Cy have been studied in the 
literature: 


Spearman’s Rho 


Let us choose C2 as the product copula (u,v) = u- v, describing indepen- 
dence. One easily checks that 


1 1 
+1 T+1 1 
u2 du- --|/ u-(1—u) du = : (4.46) 
| 4 F 4 12 
while 
1 pl 1 
ff wee aude = 5, (4.47) 
0 Jo 


so that the central fraction in (4.45) leads to define the so-called Spearman’s 
rho: 


ps(C) = 2 [ C(u,v) dudv —3. (4.48) 
[0,1]? 


This equation can be interpreted as the difference between the probability of 
concordance and the probability of discordance for the two pairs of random 
variables (X1, Yi) and (X2, Y3), where the pairs (X1, Y1), (X2, Y2) and (X3, Y3) 
are three independent realizations drawn from the same distribution: 
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ps = 3(Pr[(X1 — X2)(Y%1 — Ys) > 0] — Pr[(X1 — X2)(¥1 — Ys) <O]) . (4.49) 


By definition, Spearman’s rho equals zero for independent random vari- 
ables while the lower (resp. upper) bound is reached if and only if the random 
variables are countermonotonic (resp. comonotonic). An alternative expres- 
sion is 


pa(C) = 12 : a pat PAOD) 3. (4.50) 


It enlightens the fact that Spearman’s rho is related to the linear correlation of 
the rank. Indeed, considering two random variables X and Y, with marginal 
distributions F’x and Fy, it is straightforward to check that 


_ Cov (Fx(X), Fy(Y)) 
~\/Var Fx (X)- VarFy(Y) | 


(4.51) 


s 


Our introduction of Spearman’s rho, motivated from Kendall’s 7, shows 
that they are closely related. In fact, given any copula C’, Kruskal [281] has 
shown that 


al a ee 

S Spe _ , >, (4.52) 
| 3r +1 

To <pss ors. (4.53) 


Figure 4.5 shows that the area of accessible values for the couple (7, ps) repre- 
sents a relatively narrow strip, reflecting the strong relation between Kendall’s 
7 and Spearman’s rho. 
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Fig. 4.5. The shaded area represents the allowed values for the couple (7, ps) 
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Gini’s Gamma 


Instead of choosing the reference copula C2(u,v) = (u,v) = u-v, one can 
consider an equally weighted mixture of the two copulas expressing monotonic 
dependence: 


1 1 
Co(u, v) = 5 min(u, v) + 5 max(u+v—1,0). (4.54) 


The central fraction in (4.45) then measures how far is a given copula C; from 
the monotonous dependence. Simple algebraic manipulations show that 


[ctu au 74 = ~| fea Co(u u) du ue i 


and 


i i Cie Cad) =j . (4.56) 


The central fraction in (4.45) then yields the so-called Gini’s gamma: 


noy=a[ for C(u, u) aut fool C(u =u) du~ 5) (4.57) 


Note that this measure of dependence only relies on the values taken by C’ on 
its main diagonals. The alternative expression 


cy =4| [coun ahi ike Cot ay au (4.58) 


shows that Gini’s gamma represents the difference of the area between the 
values of C(u,v) and max(u+ vu —1,0) on the first diagonal and between the 
value of C(u,v) and min(u,v) on the second diagonal (see the shaded areas 
in Fig. 4.6). 


4.2.3 Common Properties of Kendall’s Tau, Spearman’s Rho 
and Gini’s Gamma 


The three measures of dependence — Kendall’s tau, Spearman’s rho and Gini’s 
gamma — presented in the previous paragraphs enjoy the same set of proper- 
ties: 


1. they are defined for any pair of continuous random variables X and Y, 

2. they are symmetric: for any pair X and Y, 7(X,Y) = 7(Y, X), for instance, 

3. they range from —1 to +1, and reach these bounds when X and Y are 
countermonotonic and comonotonic respectively, 
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Fig. 4.6. The shaded surface represents the area between the values of C(u,v) — 
here the product copula I7(u,v) = u-v — and max(u+v—1,0) on the first diagonal 
and between the value of C(u,v) and min(u, v) on the second diagonal 


4. they equal zero for independent random variables, 
5. if the pair of random variables (X1, X2) is more dependent than the pair 
(Yi, Y2) in the following sense: 


Cx(u,v) > Cy(u,v), Vu,v € [0,1], (4.59) 


then the same ranking holds for any of these three measures; for instance, 
T(X1,X1) = (V1, Yo). 


Any measure of dependence fulfilling these five properties is named a con- 
cordance measure. The central fraction in (4.45), with any exchangeable cop- 
ula C2 such that condition (4.44) is fulfilled together with ps (C2) = 3 7(C2), 
ensuring that the numerator of the central term of (4.45) vanishes for 
Ci(u,v) = u.v, provides a measure of dependence which satisfies the five 
conditions above, and is thus a concordance measure. 


4.3 Dependence Metric 


Concordance measures fulfill most of the requirements expected from a mea- 
sure of dependence. Following Granger et al. [214], one can impose slightly 
more demanding properties for a functional measure F|X,Y] of dependence 
between two random variables X and Y, which strengthen properties 1—4 of 
concordance measures as follows: 


1. F is well defined for both continuous and discrete random variables, 
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2. F is invariant under continuous and strictly increasing transformations of 

the random variables, i.e., F depends only on the copula of X and Y, 

F is a distance, 

F equals 0 if X and Y are independent, and varies between 0 and 1, 

5. F equals 1 (or, at least, reaches a maximum) if there exist a measurable 
mapping between the random variables X and Y: X = f(Y), 

6. F has a simple relationship with the (linear) correlation coefficient in the 
case of a bivariate normal distribution. 


mw 


Dependence measures satisfying all these requirements are named Dependence 
metrics. 

As an example, one can consider the measure introduced by Bhattacharya, 
Matusita and Hellinger: 


s=3 ffi (ARB) T atte a 


where f and g denote the marginal densities of X and Y respectively, while h 
and H are the bivariate density and distribution functions of (X,Y). Simple 
algebraic manipulations give 


oS [1 —[e(u,v)]'/?2| dudv , (4.61) 
[0,1]? 


where c is the density of the copula of X and Y, showing that S agrees 
with the second requirement. Properties 3-5 are easy to check while the last 
requirement has been established in [442]. Indeed, for two random variables 
with Gaussian copula and shape coefficient p, one has 


(fs ae ee (4.62) 


This dependence metric is in fact related to a generalized relative en- 
tropy between the joint density h and the product density f - g. Consider 
the generalized Kullback-Leibler distance (obtained by symmetrization of the 
Kullback-Leibler divergence) for the k-class entropy family [225] defined by 

1 
Hef) =~ 1 -E[P"), k#1, (4.63) 
=-Ellnf], k=1, (4.64) 


where f is the density of the random variable (or vector) under consideration.' 
In the particular case k = 1, one retrieves the usual Shannon entropy. One 


' This k-class entropy is also known as Tsallis entropy of order k in the physical 
literature [476] and has many applications to characterize complex systems with 
nonseparable long-range space/time dependences. 
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can then show that S is equal to one-fourth of the symmetric relative entropy 
of h and f -g for the 1/2-class entropy. 

Dependence metrics such as S provide very useful tools to test the presence 
of complicated serial dependences. This is particularly important not only to 
analyze and forecast financial time series [214] but also to test the goodness- 
of-fit in copula modeling, as we shall see in Chap. 5. 


4.4 Quadrant and Orthant Dependence 


In practice, it is often useful to characterize the dependence of more than 
two variables. For instance, risk management deals with portfolios made of 
dozens up to tens of thousands of assets. The analysis of the risks associated 
with such portfolios requires the assessment of the dependence between many 
(N) variables. It is in general not true that the genuine multivariate depen- 
dence between the N variables can be adequately quantified by N(.N — 1)/2 
dependence measures between all possible pairs. A first approach to define 
generalized correlations between N > 2 random variables has been already 
described in Sect. 4.1.3. We now discuss other measures which can be shown 
to be pure copula properties. 

First, note that the previous concept of concordance cannot be easily ex- 
tended to more than two random variables. The intuition behind this state- 
ment can be obtained by taking the example of Kendall’s 7. One could think 
of generalizing the integral expression (4.32) of Kendall’s r to higher dimen- 
sions. However, this straightforward generalization loses several nice properties 
of the concordance measures. In particular, the concept of countermonotonic- 
ity cannot be used for more than two random variables. Indeed, consider 
three random variables X,Y and Z, such that (X,Y) and (Y, Z) are counter- 
monotonic; then (X, Z) are necessarily comonotonic.? As a consequence, even 
if the extension to higher dimensions of Kendall’s 7 remains bounded by —1 
(by Fréchet-Hoeffding inequality), it is not ascertained that this bound can 
still be reached. Therefore, the interpretation of a negative value for such a 
generalized Kendall’s 7 would not be obvious. 

In order to provide measures of dependences which do not suffer from this 
problem, let us first introduce the notion of positive quadrant dependence 
[300]. Two random variables X and Y are positive quadrant dependent (PQD) 
if 


Pr[X <2,Y <y] > Pr[X <a]-PrlY¥ <y], Va,y. (4.65) 


This inequality means that the probability that the two random variables X 
and Y are simultaneously small is at least as large as it would be if these two 


? This effect is related to the concept of “frustration” introduced in statistical 
physics to describe situations in which constraints tending to create opposite 
states in two interacting variables cannot be all obeyed in systems of three or 
more elements [475, 481]. Frustration leads in general to multiple equilibria [360]. 


4.4 Quadrant and Orthant Dependence 165 


random variables were independent. If X and Y represent the returns of two 
PQD assets, the probability that they undergo simultaneous large losses is not 
less than it would be if they were independent. As a consequence, one expects 
(and it can be shown [130]) that risk-averse investors prefer a portfolio X +Y 
made of independent replications of assets X and Y to a portfolio X + Y 
made of the actual PDQ assets. This means that, for any increasing concave 
utility function U, 


E[U(X +Y)] <E lv (x + ¥)| (4.66) 
Inequality (4.65) can be rewritten as 
Pr{X >2,Y >y|>Pr[X >a]-Pr[Y >y], Va,y. (4.67) 


This defines two random variables as PQD if the probability that they are 
simultaneously large or small is at least as large as it would be if these two 
random variables were independent. This definition is relevant for risk man- 
agement purpose, since it amounts to ask whether large losses of individual 
assets tend to occur more frequently together than they would if the assets 
were independent. 

Definition (4.65) implies that X and Y are PQD if and only if their copula 
C satisfies 


C(u,v) > M(u,v)=u-v, Vu,v € (0,1). (4.68) 


This ensures that the PQD property depends only on the dependence structure 
of the random variables (and not on their marginals). 

The PQD property and the concordance measures are intimately related. 
Indeed, as recalled in Sect. 4.2.3, if the pair of random variables (X1, X2) is 
more dependent than the pair (Yi, Y2), it is also more concordant. So, any 
PQD pair of random variables is more concordant than independent pairs of 
random variables. But, since any concordance measure equals zero for inde- 
pendent random variables, we can assert that, given any concordance mea- 
sure, any pair of PQD random variables has a positive concordance measure. 
In particular, Kendall’s tau, Spearman’s rho or Gini’s gamma are necessarily 
positive for PQD random variables. Besides, (4.48) shows that the Spearman’s 
rho is a kind of averaged positive quadrant dependence. 

To conclude this brief survey of the properties of PQD random variables, 
let us stress that the same result holds for the usual linear correlation coeffi- 
cient. Indeed, by Hoeffding identity (4.9), any PQD random variables exhibit 
a nonnegative correlation coefficient. Unfortunately, the converse does not 
hold. However, given two random variables X and Y such that the linear 
correlation coefficient p(f(X),g(Y)) exists and is non-negative for any non- 
decreasing functions f and g, then these two random variables are PQD [300]. 

Let us now generalize the bivariate concept of positive quadrant depen- 
dence to the multivariate concept of positive orthant dependence. We will say 


166 4 Measures of Dependences 


that N random variables X1, X2,...,Xy are Positive Lower Orthant Depen- 
dent (PLOD) if 


Pr[X, < @,...,Xn < ey] > Pr[Xy < a4] ---Pr[Xy < ay], (4.69) 


for all x;’s. As in the bivariate case, this equation simply means that the 
probability that the N random variables X1,..., Xj are simultaneously small 
is at least as large as it would be if these N random variables were independent. 

Similarly, N random variables X), X2,...,Xy are Positive Upper Orthant 
Dependent (PUOD) if 


Pr[X, > 21,...,XN > LN] > Pr[xX1 > xy]---Pr[Xy eS IN| ; (4.70) 


for all x,;’s. Again, this equation has a simple interpretation: the probability 
that the N random variables X1,..., Xj are simultaneously large is at least as 
large as it would be if these N random variables were independent. Note that 
the two definitions (4.69) and (4.70) are not equivalent anymore for N > 2. 

Finally, N random variables X1, X2,...,Xy are Positive Orthant Depen- 
dent (POD) if they are both PUOD and PLOD: the probability that the N 
random variables X;,...,X,~ are simultaneously small or large is at least as 
large as it would be if these N random variables were independent. 

In terms of copulas, these definitions can be expressed as follows. Given a 
N-random vector X = (X1,...,Xw) with copula C, 


N 
X isPLOD => C(m,...,uv) >] Ju, Vui € [0,1], (4.71) 
w=1 
and 
7 N 
X is PUOD <=> C(u,...,uv) > [[-w), (4.72) 


BR 


i= 


where C denotes the survival copula of C. 

For Archimedean copulas, the PLOD property can easily be related to the 
shape of its generator y. In fact, in order for an Archimedean copula Cy, to 
be PLOD, it is sufficient that the mapping 


cE Ry,r> y(e*) (4.73) 


be concave, or at least sub-additive (the former implying the later). Indeed, 
for any u; € [0,1], the assumption that y(e~*) is sub-additive allows us to 
write that 

(ene) Sieeeeal yp (ene) 

(ui) +--+ + ¢ (un) , (4.74) 


y (exp[—(— Inu) —---— (—Inuy))) < 9 
<<” 


so that 
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exp [—(— Int) — ++» —(-Inuy)] < PN (p(u) +--+ (un), (4.75) 


which is equivalent to 


N 
Cy Giresztin) Sue. (4.76) 
1=1 


The proof that the subadditivity of y (e~*) is in fact a necessary and sufficient 
condition for an Archimedean copula to be PLOD can be found in [147]. 

Let us remark that any completely monotonic generator fulfills the re- 
quirement that (4.73) be concave. Therefore, any Archimedean copula which 
admits a generalization to arbitrary dimension is PLOD. Archimedean cop- 
ulas which exist in any dimension necessarily exhibit positive associations 
and their bivariate marginals cannot have negative concordance measures. In 
this respect, the bivariate Clayton or Frank copulas admit an n-dimensional 
generalization for positive parameter value @ only. 

The property of POD is a reasonable assumption for most asset returns. 
This allows us to sharpen the (universal) bound for the VaR of the portfolios 
considered in Fig. 3.11. Instead of considering the Fréchet-Hoeffding lower 
bound in (3.76), one can choose Cing = Coup = H, where I(u,v) = u- v is 
the product copula. 

The concept of POD is also appealing for testing whether some trading 
strategies are actually market neutral. Such strategies are very common in the 
alternative investment industry. They aim at decoupling portfolio moves from 
market moves, in order to ensure a better stability of the performance of port- 
folios. Portfolio managers often focus solely on their fund’s beta, trying to keep 
it as small as possible while raising their alpha (the market-independent part 
of the expected return). However, if this approach allows them in principle 
to remove any linear dependence between the portfolio and the market, it to- 
tally neglects nonlinear and extreme dependences. Therefore, testing for POD 
seems necessary in order to check whether a fund is actually market neutral. 
Denuit and Scaillet [127] have proposed a nonparametric test for POD and, 
considering the HRF and CSFB/Tremont market neutral hedge fund indices, 
they have shown that both of them exhibit weak linear dependence with the 
S&P 500 index — as expected — but that POD cannot be rejected between the 
CSFB/Tremont market neutral index and the Standard & Poor’s 500. There- 
fore, some funds contributing in the composition of the CSFB/Tremont index 
may exhibit nonlinear or extreme dependence with the Standard & Poor’s 500. 
This teaches us that focusing on beta is clearly not sufficient to ensure market 
neutrality. We will come back to this problem at the end of this chapter when 
constructing portfolios which minimize the impact of extreme market moves. 
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4.5 Tail Dependence 
4.5.1 Definition 


Positive quadrant (and more generally orthant) dependence is a very strong 
property. It requires that the relation (4.68) holds for every point on the 
unit square for two variables (in the hypercube for more than two variables). 
It could be interesting to weaken this definition to focus on properties of 
local positive quadrant dependences. For instance, one could wish to focus on 
the lower left corner only, in order to assess whether joint losses occurring 
with (marginal) probability level less than u, say, appear more likely together 
than one could expect from statistically independent losses. Recall that the 
smaller the value of u, the more extreme are the losses. In this vein, the 
notion of tail dependence, aiming at quantifying the propensity of two random 
variables to exhibit concomitant extreme movements, has been introduced as 
a particularly interesting measure of extreme risks. 

The concept of tail dependence is appealing in its simplicity. By definition, 
the (upper) tail dependence coefficient is 


du = lim Pr[X > Fy (wlY > Fe*(y)] , (4.77) 


and quantifies the probability to observe a large X, assuming that Y is large 
itself. In other words, given that Y is very large (at some level of probability 
u), the probability that X is very large at the same probability level u defines 
asymptotically the tail dependence coefficient 4. As an example, if X and Y 
represent the volatilities of two different national markets, their coefficient of 
tail dependence X gives the probability that both markets exhibit together 
very high volatilities. 

One can also interpret this expression (4.77) in terms of a Value-at-Risk. 
Indeed, the quantiles Fx'(u) and Fy'(u) are nothing but the Values-at-Risk 
of assets (or portfolios) X and Y at the confidence level u, if we count losses 
as positive. Thus, the coefficient Ay simply provides the probability that X 
exceeds the VaR at level u, assuming that Y has exceeded the VaR at the 
same probability level u, when this level goes to one. As a consequence, the 
probability that both X and Y exceed their VaR at the level u is asymptot- 
ically given by Ay - (1 — u) as u > 1. As an example, consider a daily VaR 
calculated at the 99% confidence level. Then, the probability that both X 
and Y undergo a loss larger than their VaR at the 99% level is approximately 
given by Ay /100. Thus, when Ay is about 0.1, the typical recurrence time 
between such concomitant large losses is about 4 years, while for Ay ~¥ 0.5 it 
is less than 10 months. 


4.5.2 Meaning and Refinement of Asymptotic Independence 


One of the appeals of this definition (4.77) of tail dependence is that it is 
a pure copula property, i.e., it is independent of the margins of X and Y. 
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Indeed, let C' be the copula of the variables X and Y. If their bivariate copula 
C is such that 


Se log C(u, u) 
ul 1l-u usl log u 


= dy (4.78) 


exists, then C' has an upper tail dependence coefficient Ay (see [106, 149, 147]). 
In a similar way, one can define the coefficient of lower tail dependence: 


Ap = lim, Pr{X < Fx7"(u) | ¥ < Fy'(u)} = lim oe 


a (4.79) 

If \ > 0,° the copula presents tail dependence and large events tend to 
occur simultaneously, with (conditional) probability A. On the contrary, when 
A» = 0, the copula has no tail dependence and the variables X and Y are said 
to be asymptotically independent. There is however a subtlety in this definition 
(4.77) of tail dependence. To make it clear, first consider the case where, for 
large X and Y, the cumulative distribution function F(x, y) factorizes such 
that 


F(z, y) 


oie FAO) ne 


where Fx (x) and Fy(y) are the margins of X and Y respectively. This means 
that, for X and Y sufficiently large, these two variables can be considered as 
independent. It is then easy to show that 


lim Pr{X > Fx7*(u)[Y > Fy7*(u)} = lim 1 — Fx(Fx7'(u)) (4.81) 
= lim1l—u=0, (4.82) 


so that independent variables really have no tail dependence \ = 0, as one 
can expect. 

However, the result A = 0 does not imply that the multivariate distribu- 
tion can be automatically factorized asymptotically, as shown by the Gaussian 
example. Indeed, the Gaussian bivariate distribution cannot be factorized, 
even asymptotically for extreme values, since the non-diagonal term of the 
quadratic form in the exponential function does not become negligible in gen- 
eral as X and Y go to infinity together. Therefore, in a weaker sense, there 
may still be a dependence in the tail even when A = 0. 

To make this statement more precise, following [106], let us introduce the 
coefficient 


Z : 2log Pr{X > Fy~"(u)} 
Av = lim + = 
ul log Pr{X > Fx “(u),Y > Fy“ (u)} 
2 log(1 — u) 
lim 
ul log[1 — 2u + C(u, u)] 


1 (4.83) 


(4.84) 


3 In the sequel, \ without subscript will represent either Ay or Ax. 
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It can be shown that the coefficient Ay = 1 if and only if the coefficient of tail 
dependence Ay > 0, while Ay takes values in [—1,1) when Ay = 0, allowing 
us to refine the nature of the dependence in the tail in the case when the tail 
dependence coefficient is not sufficiently informative. It has been established 
that, when \ > 0, the variables X and Y are simultaneously large more 
frequently than independent variables, while simultaneous large deviations of 
X and Y occur less frequently than under independence when \ < 0. In the 
first case, the variables X and Y can be said to be locally PQD (positive 
quadrant dependent) in the neighborhood of the point (0,0) and/or (1,1) in 
probability space. 

To summarize, independence (factorization of the bivariate distribution) 
implies no tail dependence (\ = 0). But A = 0 is not sufficient to imply factor- 
ization and thus true independence. It also requires as a necessary condition 
that \ = 0. 


4.5.3 Tail Dependence for Several Usual Models 


We present several general results allowing for the calculation of the tail de- 
pendence of Archimedean copulas, elliptical copulas and copulas derived from 
factor models. 


Archimedean Copulas 


The generator of an Archimedean copula fully embodies the properties of 
dependence (and therefore of extreme dependence). As a consequence, the 
coefficient of tail dependence of an Archimedean copula can be expressed solely 
in terms of its generator. A simple application of L’Hospital’s rule shows that 
any Archimedean copula, with a strict generator y (that is, such that (0) is 
infinite so that yl-! = y~!), has a coefficient of upper tail dependence given 
by 


eal 
2t 
ete ee) 


jim Tp (4.85) 


As a consequence, if y~!’(0) > —oo, the coefficient of upper tail dependence 
is identically zero. For an Archimedean copula to present tail dependence, it 
is necessary that him py \'(t) = —0o. 


Similarly, the coefficient of lower tail dependence is 


—1/ 
2t 
Ne Oi ee (4.86) 
t00 yp} (t) 
so that y~!’(0o) must be equal to 0 in order for the Archimedean copula to 
have a nonzero lower tail dependence. 
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Table 4.2. Expressions of the coefficient of upper and lower tail dependence for 
three Archimedean copulas. Note that the usual range for the parameter 6 of Clay- 
ton’s copula has to be restricted to [0, 00] in order for the generator to be “strict” 


Copula y(t)" AL Au Range 

Clayton (ben >? Q-1/6 0 6 € [0, oo] 
1/0-1 

Gumbel at exp (-'/*) 0 rene is 6 € [1, co] 


1 (1 _ e°) et 


Frank . 
ares 0 1—(1—e7%-*) 


0 0 0 € [—co, on] 


Table 4.2 gives the coefficients of tail dependence of several Archimedean 
copulas. It illustrates the fact that some copulas have an upper tail depen- 
dence but no lower tail dependence (the Gumbel copula) or, on the contrary, 
some copulas have no upper tail dependence but have a lower tail dependence 
(Clayton copula). More precisely, the coefficient of lower tail dependence of 
Clayton’s copula equals 2~!/° while the coefficient of upper tail dependence 
of Gumbel’s copula is 2 — 2!/9. In addition, the generator of Clayton’s copula 
is regularly varying at t = 0 (see Table 4.1), with a tail index —6 while the 
generator of Gumbel’s copula is regularly varying at t = 1, with tail index 0. 

In fact, one can show that any Archimedean copula, with a generator 
regularly varying at zero and tail index —@ (with 0 > 0), has a coefficient of 
lower tail dependence equal to 2~1/°. Indeed, by (4.79) 


=a 
Az = lim eg 294) _ in ieee a (4.87) 
u—O0t U x2—0+F p l(a) 


and, since y is regularly varying with tail index —0, y™! 


varying with tail index —1/6 [57], so that 


is also regularly 


(4.88) 


Similarly, any Archimedean copula with a generator regularly varying at 1 and 
with tail index @ (with @ > 1, in order to fulfill the convexity requirement), 
has a coefficient of upper tail dependence equal to 2 — 2!/°, 

These results also apply to the frailty model with frailty parameters having 
a distribution regularly varying at zero with tail index 1/6. Using the proper- 
ties of the Laplace transform, one can conclude that the copulas generated by 
such frailty models have generators which are regularly varying at zero with 
tail index —6. Therefore, they have a coefficient of lower tail dependence equal 
to 2-!/°, Similarly, copulas generated by frailty models with frailty parame- 
ters with distribution regularly varying at infinity, with tail index 1/6 (@ > 1), 
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Fig. 4.7. Contour plot of the copula with generator (4.89) (left panel) and of its 
density (right panel) for the parameters value a = 1 and 3 = 2 


have a coefficient of upper tail dependence equal to 2 — 2!/°, since they lead 
to generators which are regularly varying at 1, with tail index @. Finally, to 
obtain an Archimedean copula with both upper and lower tail dependence, 
one just has to consider generators which are regularly varying at 0 and 1, or 
alternatively to have frailty parameters with regular variation at zero and at 
infinity. 

An example is the following generator: 


y(t) =t7*-(—Int)®, (a, 8) € [0, 00) x [1, 00) , (4.89) 


with inverse 
yp” *(t) = exp |-2 -W (Se”")| (4.90) 


where W(-) denotes the Lambert function solution of W(x) -e”@) = «x. It 
allows for upper and lower tail dependence with A, = 27!/“ and Ay = 2—21/°. 
Figure 4.7 shows this copula for a = 1 and ( = 2, corresponding to Az, = 0.5 
and Ay = 2— V2 ~ 0.6. 


Elliptical Copulas 


Assuming that (X,Y) are normally distributed with correlation coefficient. p, 
it can be shown that, for all p € [-1,1), \ = 0, while A = p [149, 229]. This 
later result expresses, as one can expect, that — despite the absence of tail 
dependence — extremes appear more likely together for positively correlated 
variables. 

In contrast, if (X,Y) have a Student’s copula, one can show that the tail 
dependence coefficient is 


es ne (vi me ivr) (4.91) 
1+ p 
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Fig. 4.8. Coefficient of upper tail dependence as a function of the correlation co- 
efficient p for various values of the number of degrees of freedom vy for Student’s 
copula 


which is greater than zero for all p > —1, and thus N= 1. T,+1 is the Student 
distribution with v degrees of freedom and the bar denote the complemen- 
tary distribution. This result \ = 1 proves that extremes appear more likely 
together whatever the correlation coefficient may be, showing that, in fact, 
there is no general relationship between the asymptotic dependence and the 
linear correlation coefficient. Figure 4.8 shows the coefficient of upper tail de- 
pendence as a function of the correlation coefficient p for various values of the 
number of degrees of freedom v. 

These distinctive properties of the Gaussian and Student’s copulas, char- 
acterized by the absence or presence of tail dependence, are illustrated in 
Fig. 4.9 which shows the realizations of two random variables with identical 
standard Gaussian marginals, with a Gaussian copula or a Student’s copula 
with three degrees of freedom and the same correlation coefficient p = 0.8. 
In the right panel for the Student’s copula, the realizations (dots) are found 
to lie within a diamond-shaped domain with narrower and narrower tips as 
more extreme values are considered. This phenomenon can be observed, not 
only for the bottom-left and upper-right quadrants, but also for the upper-left 
and bottom-right quadrants. This results from the fact that the tail depen- 
dence coefficient remains nonzero even for negative correlation coefficients as 
illustrated in Fig. 4.8. 

The Gaussian and Student’s distributions are two examples of elliptical 
distributions. More generally, the following result is known: elliptically dis- 
tributed random variables present a nonzero tail dependence if and only if 
they are regularly varying, 7.e., their distributions behave asymptotically like 
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Gaussian copula Student’s copula 
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Fig. 4.9. Realizations of two random variables with Gaussian marginals and with 


a Gaussian copula (left panel) and a Student’s copula with three degrees of freedom 
(right panel) with the same correlation coefficient p = 0.8 


power laws with some exponent v > 0 [239]. In such a case, for every regu- 
larly varying pair of random variables which are elliptically distributed, the 
coefficient of tail dependence » is given by expression (4.91). This result is 
natural since the correlation coefficient is an invariant quantity within the 
class of elliptical distributions and since the coefficient of tail dependence is 
only determined by the asymptotic behavior of the distribution, so that it 
does not matter that the distribution is a Student’s distribution with v de- 
grees of freedom or any other elliptical distribution as long as they have the 
same asymptotic behavior in the tail. 


Linear Factor Models 


Consider the one-factor model 
X\ = By “Yr €1, (4.92) 
X_=fo-Yt+e., (4.93) 


where the e,;’s are random variables independent of Y and the (,’s are non- 
random positive coefficients. 

The tail dependence A of X; and X2 can be simply expressed as the min- 
imum of the tail dependence coefficients A; and Az between the two random 
variables X, and Y, on the one hand and X2 and Y, on the other hand 
(332, 335]: 


A= min{ A, A2} . (4.94) 


To understand this result, note that the tail dependence between X, and X92 
is created only through the common factor Y. It is thus natural that the tail 
dependence between X, and X2 is bounded from above by the weakest tail 
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dependence between the X;’s and Y while deriving the equality requires more 
work. The result (4.94) generalizes to an arbitrary number of random variables 
and shows that the study of the tail dependence in linear factor models can 
be reduced to the analysis of the tail dependence between each individual 
X;, and the common factor Y. In the following, we thus omit the subscript 7 
and consider without loss of generality one X linearly regressed on a factor Y 
according to X = 3-Y +e. 

A general result concerning the tail dependence generated by factor models 
for any kind of factor and noise distributions is as follows [332, 335]: the 
coefficient of (upper) tail dependence between X and Y is given by 


A= fe dx f(x) , (4.95) 


max{1,4 } 


where, provided that they exist, 


l = lim ey) . (4.96) 
ul Fy (u) 
= t- Py(t-2) 
f(z) = Jim ho.” (4.97) 


where Fx and Fy are the marginal distribution functions of X and Y respec- 
tively, and Py is the density of Y. 

As a direct consequence, one can show that any rapidly varying factor, 
which encompasses the Gaussian, the exponential or the gamma distributed 
factors for instance, leads to a vanishing coefficient of tail dependence, what- 
ever the distribution of the idiosyncratic noise may be. This result is obvious 
when both the factor and the idiosyncratic noise are normally distributed, 
since then X and Y follow a bivariate Gaussian distribution, whose tail de- 
pendence has been said to be zero. 

On the contrary, regularly varying factors, like the Student’s distributed 
factors, lead to a tail dependence, provided that the distribution of the idio- 
syncratic noise does not become fatter-tailed than the factor distribution. One 
can thus conclude that, in order to generate tail dependence, the factor must 
have a sufficiently “wild” distribution. To present an explicit example, let us 
assume now that the factor Y and the idiosyncratic noise € have centered Stu- 
dent’s distributions with the same number v of degrees of freedom and scale 
factors respectively equal to 1 and a. The choice of the scale factor equal to 
1 for Y is not restrictive but only provides a convenient normalization for o. 
Appendix 4.A shows that the tail dependence coefficient is given by 


a (4.98) 


- Vv 
1+ (4) 
This expression shows that, the larger the typical scale o of the fluctuation of 


e and the weaker the coupling coefficient (3, the smaller is the tail dependence, 
in accordance with intuition. 
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The linear correlation coefficient p for the one-factor model is given by 
p=(1+ )-v 2. which allows us to rewrite the coefficient of upper tail 
dependence in terms of p > 0 and v > 2: 

pe i 
A= = . (4.99) 
wr OR AYP 1 One)? 


pe 


Surprisingly, A does not go to zero for all p’s as v goes to infinity, as could 
be anticipated from the fact that the Student’s distribution converges to the 
Gaussian distribution which is known to have zero tail dependence. Expression 
(4.99) predicts that 4 — 0 when v — oo for all p’s smaller than 1/2. But, 
and here lies the surprise, \ — 1 for all p larger than 1//2 when v — oo. This 
counterintuitive result is due to a non-uniform convergence which makes the 
order of the two limits non-commutative: taking first the limit u — 1 and then 
vy — oo is different from taking first the limit vy — oo and then u— 1. Ina 
sense, by taking first the limit u — 1, we always ensure the power law regime 
even if v is later taken to infinity. This is different from first “sitting” on 
the Gaussian limit v — oo. This paradoxical behavior reveals the sometimes 
paradoxical consequences of taking the limit u — 1 in the definition of the 
tail dependence. 

As an illustration, Fig. 4.10 presents the coefficient of tail dependence for 
the Student’s factor model as a function of p for various values of v. It is 
interesting to compare this figure with Fig. 4.8 depicting the coefficient of tail 
dependence for the Student’s copula. Note that is vanishing for all negative 
p’s in the case of the factor model, while A remains nonzero for negative values 
of the correlation coefficient for bivariate Student’s variables. 

If Y and ¢ have different numbers vy and vy. of degrees of freedom, two 
cases occur. For vy < 1, € is negligible asymptotically and A = 1. For vy > i, 
X becomes asymptotically identical to «. Then, X and Y have the same tail- 
dependence as € and Y, which is zero by construction. 

A straightforward generalization of this result can be derived for the mul- 
tifactor model [72]: 


AHA Nt+---+ fin -Ynt+a, (4.100) 
Xo = fai Yit---+ Pon-Ynt+e. (4.101) 


The following generalization of (4.98) gives the coefficient of tail dependence 
between X, and one of the Y; as 


oe 
tei te 


provided that the Y;’s remain independent factors. For simplicity, we have 
assumed that all the factors are standardized, i.e., their scale factors are all 
equal to one. Generalizing expression (4.94), the coefficient of tail dependence 
between X, and X92 is 


ALi = (4.102) 
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Fig. 4.10. Coefficient of upper tail dependence as a function of the correlation 
coefficient p for various values of the number of degrees of freedom v for the Student’s 
factor model 


n 
Xr => S- 116, ;-82,4>0} < min (Avis 2,1) i (4.103) 
i=l 
These results are of particular interest for portfolio analysis and risk man- 
agement, as we shall see in the next section. 


4.5.4 Practical Implications 


Let us now give two straightforward applications of the tail dependence for 
financial purposes. We also refer the reader to [390] for other financial appli- 
cations. 


Portfolio Tail Risk Management 


Table 4.3 presents the results obtained on the estimations of the upper and 
lower coefficients of tail dependence between several major stocks and the 
market represented here by the Standard & Poor’s 500, over the last decade. 
The estimation has been performed under the assumption that (4.92-4.93) 
hold, in which the factor is represented by the Standard & Poor’s 500. Using 
the market index as the factor is reasonable since, according to standard 
financial theory, the market’s return is well-known to be the most important 
explanatory factor for the return of each individual asset.+ The coefficient of 


4 In a situation where the common factor cannot be easily identified or estimated, 
the results for elliptic distributions obtained in Sect. 4.5.3 may provide a conve- 
nient alternative. 
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Table 4.3. This table presents the coefficients of lower and of upper tail depen- 
dence of the companies traded on the NYSE and listed in the first column with 
the Standard & Poor’s 500. The returns used for the calculations are sampled in 
the time interval from January 1991 to December 2000. The numbers within the 
parentheses are the estimated standard deviations of the empirical coefficients of 
tail dependence. Reproduced from [332] 


AL AU 
Bristol-Myers Squibb Co. 0.16 (0.03 0.14 (0.01) 
Chevron Corp. 0.05 (0.01 0.03 (0.01) 
Hewlett-Packard Co. 0.13 (0.01 0.12 (0.01) 
Coca-Cola Co. 0.12 (0.01 0.09 (0.01) 
Minnesota Mining & MFG Co. 0.07 (0.01 0.06 (0.01) 
Philip Morris Cos Inc. 0.04 (0.01) 0.04 (0.01) 
Procter & Gamble Co. 0.12 (0.02 0.09 (0.01) 
Pharmacia Corp. 0.06 (0.01 0.04 (0.01) 
Schering-Plough Corp. 0.12 (0.01 0.11 (0.01) 
Texaco Inc. 0.04 (0.01 0.03 (0.01) 
Texas Instruments Inc. 0.17 (0.02 0.12 (0.01) 
Walgreen Co. 0.11 (0.01 0.09 (0.01) 


tail dependence between any two assets is then easily derived from (4.94). It 
is interesting to observe that the coefficients of tail dependence seem almost 
identical in the lower and the upper tail. Nonetheless, the coefficient of lower 
tail dependence is always slightly larger than the upper one, showing that large 
losses are more likely to come together compared with large gain occurrences. 

Two clusters of assets clearly stand out: those with a tail dependence of 
about 10% (or more) and those with a tail dependence of about 5%. Let 
us exploit this observation and explore some consequences of the existence 
of stocks with drastically different tail dependence coefficients with the in- 
dex. These stocks offer the interesting possibility of constructing a prudential 
portfolio which can be significantly less sensitive to the large market moves. 
Figure 4.11 compares the daily returns of the Standard & Poor’s 500 with 
those of two portfolios P; and P:: P,; is made of the four stocks (Chevron 
Corp., Philip Morris Cos Inc., Pharmacia Corp., and Texaco Inc.,) with the 
smallest \’s while P: is made of the four stocks (Bristol-Meyer Squibb Co., 
Hewlett-Packard Co., Schering-Plough Corp., and Texas Instruments Inc.,) 
with the largest X’s. In fact, we have constructed two variants of P, and two 
variants of P2. The first variant corresponds to choose the same weight 1/4 
of each asset in each class of assets (with small \’s for P, and large \’s for 
P,). The second variant has asset weights in each class chosen in addition to 
minimize the variance of the resulting portfolio. We find that the results are 
almost the same between the equally weighted and minimum-variance port- 
folios. This makes sense since the tail dependence coefficient of a bivariate 
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random vector does not depend on the variances of the components, which 
only account for the price moves of moderate amplitudes. 

Figure 4.11 presents the results for the equally weighted portfolios gener- 
ated from the two groups of assets. Observe that only one large drop occurs 
simultaneously for P,; and for the Standard & Poor’s 500 in contrast with P» 
for which several large drops are associated with the largest drops of the in- 
dex and only a few occur desynchronized. The figure clearly shows an almost 
circular scatter plot for the large moves of P; and the index compared with a 
rather narrow ellipse, whose long axis is approximately along the first diago- 
nal, for the large returns of P, and the index, illustrating that the small tail 
dependence between the index and the four stocks in P,; automatically implies 
that their mutual tail dependence is also very small, according to (4.94). Asa 
consequence, P, offers a better diversification with respect to large drops than 
P,. This effect already, quite significant for such small portfolios, should be 
overwhelming for large ones. The most interesting result stressed in Fig. 4.11 
is that optimizing for minimum tail dependence automatically diversifies away 
the large risks. 


a Portfolio P 
, 1 O° fo) 
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Fig. 4.11. Daily returns of two equally weighted portfolios P; (made of four stocks 
with small \ < 0.06) and P2 (made of four stocks with large \ > 0.12) as a function 
of the daily returns of the Standard & Poor’s 500 over the period January 1991 to 
December 2000. The straight (resp. dashed) line represents the regression of portfolio 
P, (resp. Pz) on the Standard & Poor’s 500. Reproduced from [332] 
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These advantages of portfolio P, with small tail dependence compared 
with portfolio Pz with large tail dependence with respect to the Standard & 
Poor’s 500 index come at almost no cost in terms of the daily Sharpe ratio, 
equal respectively to 0.058 and 0.061 for the equally weighted and minimum 
variance P; and to 0.069 and 0.071 for the equally weighted and minimum 
variance P . 

The straight lines in Fig. 4.11 represent the linear regressions of the returns 
of the two portfolios on the index returns, and show that there is significantly 
less linear correlation between P; and the index (correlation coefficient of 0.52 
for both the equally weighted and the minimum variance P,) compared with 
P» and the index (correlation coefficient of 0.73 for the equally weighted P, 
and of 0.70 for the minimum variance P:). Theoretically, it is possible to con- 
struct two random variables with small correlation coefficient and large X and 
vice-versa. Recall that the correlation coefficient and the tail dependence coef- 
ficient are two opposite end-members of dependence measures: the correlation 
coefficient quantifies the dependence between relatively small moves while the 
tail dependence coefficient measures the dependence during extreme events. 
The finding that P,; comes with both the smallest correlation and the smallest 
tail dependence coefficients suggests that they are not independent properties 
of assets. This intuition is in fact explained and encompassed by the factor 
model since the larger (@ is, the larger is the correlation coefficient and the 
larger is the tail dependence. Diversifying away extreme shocks may provide 
a useful diversification tool for less extreme dependences, thus improving the 
potential usefulness of a strategy of portfolio management based on the tail 
dependence proposed here. 


Impact on Dependent Default Modeling 


Consider N obligators with individual default probability 7;,71=1,...,N and 
default indicator 


De=1X%,<T,, (4.104) 


where (X,,...,Xy) denotes the vector of latent variables and (T},...,Tn) 
the vector of thresholds below which default occurs, Sect. 3.6.4 has shown 
that the probability that k obligators — labeled 7,,...,7, — among N default 
is given by 


Pr [Di, = 1,...,Di, = 1] = C (Wis +++ 1 Mix) 5 (4.105) 


where C is the copula of the latent variables X; under consideration. 

This equation emphasizes the key role of the copula in credit risk mod- 
eling. Since default probabilities are generally very low, specifically for very 
high quality obligators, the behavior of the copula in the extreme is crucial. 
As a consequence, it could seem natural that the presence or the absence of 
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Table 4.4. Ratio of the 99% quantiles of the distribution of defaulting obligators 
when the latent variables have a Student copula with v degrees of freedom, nor- 
malized with respect to the Gaussian copula, for portfolios of 10,000 homogeneous 
credits with default probability 7; and correlation p. The values of 7; and p are the 
same as in [186] 


Ty p v=50 v=10 v=4 


0.01% 2.58% 2.33 5.62 6.00 
0.50% 3.80% 1.66 3.75 6.84 
7.50% 9.21% 1.09 1.39 1.78 


tail dependence between the latent variables X; would be of particular impor- 
tance. When latent variables are asymptotically independent — as assumed in 
traditional models exposed in Sect. 3.6.4 — one can reasonably guess that such 
models would underestimate the actual occurrence of concomitant defaults. 

This view has been advocated by Frey et al. [185, 186], among others. 
Considering large credit portfolios, they investigate the evolution of the total 
number of defaulting obligators when the dependence structure describing 
their interaction changes. Table 4.4 gives the ratios of the 99% quantiles of 
the distribution of defaulting obligators for a Student’s copula normalized with 
respect to the Gaussian copula, for three credit groups of different quality. The 
ratio of the quantiles increases when the number of degrees of freedom of the 
Student’s copula decreases, since the dependence between extremes becomes 
stronger. We also observe that the ratio of the quantiles decreases with the 
quality of the obligator, 7.e., when the default probability increases. Indeed, 
in such a case, the tail dependence has a weaker impact on the portfolio loss, 
and therefore the exact shape of the copula in the neighborhood of (0,0) or 
of (1,1) is less important. 

Overall, these simulations tend to give substance to the assertion that the 
choice of the copula is fundamental. However, some recent studies support the 
opposite point of view. For the practical purpose of pricing credit derivatives, 
several authors [291, 430] have shown that the choice of the copula has in fact 
only a weak impact on the value of such contracts. As an example, Laurent 
and Gregory [291] show that the premium for the first-to-default swap in bas- 
ket default swaps is almost the same for a Gaussian copula and for a Clayton 
copula. Such results also hold for CDO® tranches. Schloeg] and O’Kane con- 
firm these results for the Student’s copula [430], for which they find that it 
does not provide significant improvement with respect to the Gaussian copula. 

To sum up, for credit derivative pricing, the choice of the copula does not 
appear to be crucial. However, taking into account the simulation results in 
[186], we clearly see that the dependence structure has a real impact on the 
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loss distribution of the credit portfolio. Therefore, even if the copula is not so 
important for derivative pricing, it could be really crucial to establish hedging 
strategies. This point has not yet been really explored to our knowledge, 
but appears as an important future development of the research on credit 
derivatives. 


Appendix 
4.A Tail Dependence Generated by Student’s Factor Model 
We consider two random variables X and Y, related by 
X=6PY +e, (4.4.1) 


where € is a random variable independent of Y and @ a nonrandom positive 
coefficient. Let us assume that Y and ¢ have a Student’s distribution with 
density: 


(4.4.2) 


6é)= = ; (4.A.3) 
uy 


Lemma 4.5.1. The probability that X is larger than Fx'(u) knowing that Y 
is larger than Fy‘(u) is given by : 


Pr[X > Fx (u)|Y¥ > Fy’ (u)] = Fe(n) 


i ° dy Fy(y) P[6Fy'(u) +n — byl , (4.4.4) 
~ SEEM) 


with 
n= Fx" (u) — BFy*(u) . (4.A.5) 


The proof of this lemma relies on a simple integration by part and a change 
of variable, which are detailed in Appendix 4.A.1. 
Introducing the notation 


Y= Fe ais (4.4.6) 
Appendix 4.A.2 shows that 
Vv l/v 
a 
1+[s -1 
G+) ) 


n=6 Yu + O(Y,") . (4.4.7) 
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This allows us to conclude that 7 — +00 as u > 1. Thus, F.(7n) - 0asu— 1 
and 


A= lim 


u-ll—w 


+o dy Fy(y)- P.(BYu +n — By) - (4.4.8) 


Now, using the following result: 


Lemma 4.5.2. Assuming v > 0 and x > 1, 


1° 1 Cy 1 
lim — dx ar =S> (4.4.9) 
e>0€ Jy xv Xo 


[a+ (2522)"| ° 
whose proof is given in Appendix 4.A.3, it is straightforward to show that 


(as (4.4.10) 


1+ (5) 


The final steps of this derivation are given in Appendix 4.A.4. 


4.A.1 Proof of Lemma 4.5.1 
By definition, 


Pr[X > Fy'(u), Y > Fy'(u)] = is wef a Py(y) - P.(x — By) 


Let us perform an integration by part: 


Pr[X > Fy"(u), Y > Fy'(u)] 
= [-Fy(y)- BAF M(u) — By) wtP fr, dy Fey) PALES) — Bu) 
= (1 —u)F(Fe"(u) — BFP"(u)) ef dy — - P.(F1(u) — By) 
Fy" (u) 
Defining n = Fx'(u) — BFy'(u) (see (4.A.5)), and dividing each term by 
PrY > Fy'(w)] =1-14, (4.4.11) 


we obtain the result given in (4.A.4) 
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4.A.2 Derivation of Equation (4.A.7) 


The factor Y and the idiosyncratic noise € are distributed according to the 
Student’s distributions with v degrees of freedom given by (4.A.2) and (4.A.3) 
respectively. It follows that the survival distributions of Y and € are: 


Fy(y) = eee Og) (4.A.12) 

= o” yo CL —(v+2) 

F.(€) = s + Ole be (4.4.13) 
and 

Pegi ig BOE Cis ogc). (4.A.14) 


ev 
Using the notation (4.4.6), (4.4.5) can be rewritten as 
Fx(n + BY.) = Fy(%)=1-u, (4.4.15) 


whose solution for large Y,, (or equivalently as u goes to 1) is 


Vv l/v 
ol 
H())" 
0+ 
To obtain this equation, we have used the asymptotic expressions of Fx and 
Fy given in (4.4.14) and (4.A.12). 


n=B VON ts (4.A.16) 


4.A.3 Proof of Lemma 4.5.2 


The change of variable 


wv — XO 


u= ; (4.4.17) 


€ 


gives 


1, fs 1 Cy co 1 Cy 
= dx 7 os ee —_ ar na yy u2\ vt 
a : i+ (== ra ea a Oa 
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Consider the second integral in the right-hand side of the last equality. We 


have 


which allows us to write 


1 oe uti 


< 
(l+u2) ah 


so that 


[.¢ 1 Cy 
U Vv 
oo “OFS 0) = 


v+l1 


ya a Cy 
= —_— dvu———— 
tT Jt (ree 
= O(c”). 
The next step of the proof is to show that 


20 


€ 1 Cy 
| du a Es 1 as e—>0O. 
ot +) 


Let us calculate 


20 


=O 
e 1 Ce i. Co 
= du pe du. 
i Ce a ee oe 
ZO 
° 1 CG 
= d 
fe ear “| Gees 


IA 


(4.4.18) 


(4.4.19) 


(4.4.20) 


(4.4.21) 
(4.4.22) 


(4.4.23) 
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0 
(1-x,)/e 0 u X/e 


Fig. 4.12. The graph of the function (4.4.25) (thick solid line), the cord which 
gives an upper bound of the function within [+2, 0] (dashed line) and the tangent 
at 0* which gives an upper bound of the function in the interval (0, 20 | (dash dotted 
line) 


The second and third integrals obviously behave like O(e”) when e€ goes 
to zero since we have assumed 2p > 1 which ensures that 4-0 — —oo and 
=0 —+ oo when ¢ — 0°. For the first integral, we have 


70 


i € d 1 CL 
UW 
ao [E+ YE 
ZO 
oe 1 CL 
< du y 
SS ssl (1 + alg (1 +4 aes 
The function 
: 1 (4.4.25) 
ea ~ 


vanishes at w= 0, is convex for u € [+=*°,0) and concave for u € (0, “2] (see 


Fig. 4.12), so that there are two constants A, B > 0 such that 
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1 o-1 1- 
= 1 SOS ek mAs een , Wue | “2,09 (4.4.26) 
(1 + ae Lo — 1 
a pe Rieti, Wwe [o. =] (4.4.27) 
Oe i Xo € 


We can thus conclude that 


1 
a+ ey 


CL 
(1+ wy 


Lo ectte 
TF 


+B: Lr ye +) oe 
= O(e* (4.4.28) 


with a = min{v, 1}. Indeed, the two integrals can be aasetes exactly, which 
shows that they behave as O(1) if vy > 1 and as O(e”~') otherwise. Thus, we 
finally obtain 


i iy 1 on 
U 
eo (+ a)” + 


Putting together (4.4.22) and (4.4.29) gives 


= O(e%) . (4.4.29) 


: / ee z we : = O(emintyt}) (4.4.30) 
HO fab (en?) 


which concludes the proof. 


4.A.4 Derivation of Equation (4.A.10) 
From (4.4.12), we can deduce 


Fy(y) = Pe (1+ O(y7?)) . (4.4.31) 


Vv 


Using (4.4.3) and (4.A.7), we obtain 


P.(B¥u +n — By) = P.¥u— By) (1+ 00%) , (4.4.32) 


y=6 (1 4: (5) ) = (4.4.33) 
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Putting together these results yields for the leading order 


Co 


dy Fy(y)- Pe(BYu +n — By) 


Yu 
v-1 
a yz C, CL 
=f dy Vv : S uti 
Yu y a (1 4 oe) 2 
vol Y 
Op fue, Bh oes 
= = | dx g phil? (4.4.34) 
B Yu 1 xv 


2 2 
141/22 
v\ Ge 


where the change of variable 7 = = has been performed in the last equation. 


We now apply Lemma 4.5.2 with x = 4 >lande= a which goes to 


zero as u — 1. This gives 


| dy Fy(y)- P-(BYu +7 - BY) ~us1 —=— 
Yu 


which shows that 


Therefore 
Pr[X > Fx (u)|Y > Fe (u)] ~us1 (£) ; (4.A.36) 
which finally yields 
1 
—————— rs (4.4.37) 


5 


Description of Financial Dependences 
with Copulas 


There are two general methods for estimating empirically the copula best de- 
scribing the dependence structure of a basket of assets, and more generally of 
a portfolio made of different financial and/or actuarial risks: parametric and 
nonparametric. The latter class is by far the most general since it does not 
require the a priori specification of a model, and should thus avoid the prob- 
lem of misspecification (model error). In contrast, the parametric approach 
has the advantage that, if a model is correctly specified, it leads to a much 
more precise parametric estimation. In addition, the reduced number of para- 
meters involved in the description of the selected copula can be interpreted as 
being the relevant meaningful variables that summarize the dependence prop- 
erties between the assets. Consider for instance the Gaussian representation, 
or more generally any presentation in terms of elliptical distributions, whose 
dependence structure is, to large extent (see Chap. 4), summarized by the 
set of linear coefficients of correlation. These coefficients of correlation thus 
play a pivotal role and it is tempting to interpret them as the macrovariables 
(or phenomenological variables) synthesizing all possible microstructural in- 
teractions between economic agents leading to the observed dependence. Let 
us recall that identifying the “correct variables” constitutes the critical first 
step in model building to obtain the best possible representation of observed 
phenomena. The usefulness of the parametric estimation is thus obvious from 
this point of view. 

The first section of this chapter reviews the most representative methods 
to estimate copulas, with an emphasis on the description of parametric ap- 
proaches. The following section focuses on the problem of model selection and 
on goodness-of-fit tests. Indeed, the estimation procedure has no sense if the 
quality and the likelihood of the model are not assessed. Instead of reviewing 
the many available goodness-of-fit tests, we discuss how to best describe the 
dependence structure of asset returns and we compare the relative merits of 
the different models considered in the literature to address this question. 


190 5 Description of Financial Dependences with Copulas 
5.1 Estimation of Copulas 


There is a significant body of literature on the estimation of copulas. This 
section aims at summarizing some of the most popular techniques which have 
appeared in the statistical literature and which are now of common use in 
modeling financial and economic variables as well as actuarial risks. 


5.1.1 Nonparametric Estimation 
The Empirical Copula 


The very first copula estimation method dates back to the work by De- 
heuvels [121, 122]. It relies on a simple generalization of the usual estimator 
of a multivariate distribution. Indeed, considering an n-dimensional random 
vector X = (Xj,...,X,) whose copula is C and given a sample of size T 
{(#i(1), v2(1), ..-,@n(1)),..., (a1 (T), v2(T),...,¢n(T))}, a natural idea is to 
estimate the empirical distribution function F' of X as 


Es 

p 1 

Fe) = FD Mer W)sarnta(kSen} > (5.1) 
k=1 


and the empirical marginal distribution functions of the X;,’s as 


A 1 
F (xi) = a BD Lee, (k)<ax;} - (5.2) 


The application of Sklar’s theorem would then appear to obtain a nonpara- 
metric estimation of the copula C. Unfortunately, even if the margins of F 
are continuous, their empirical counterparts are not. Therefore, one cannot 
determine a unique estimated copula GA Following this approach, one can, 
however, obtain a unique nonparametric estimator of C’ defined at the dis- 
crete points (4, 2, SewdiA iw), with iz, € {1,2,...,7}. Inverting the empirical 
marginal distribution function, we obtain 


: * F 4 § 
a 41 22 bn 1 
C (F. pee a =F Do Martie it) nsta 8) SenliniT) > (5.3) 


where x»(k;T) denotes the k™ order statistics of the sample {z,(1),..., 
x,(T)}. Following Deheuvels, one can define an empirical copula as any copula 
which satisfies the relation (5.3). 

It is well-known that the empirical distribution function F converges, al- 
most surely, uniformly to the underlying distribution function F’ from which 


1 The same issue arises, of course, for the empirical estimation of marginal as well 
as multivariate distributions. 
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the sample is drawn, as the sample size T goes to infinity. This property still 
holds for the nonparametric estimator defined by the empirical copula 


C(u) — C(u) 


sup 
ue [0,1]” 


0. (5.4) 
Similarly, the empirical copula density ¢ can be estimated by 


wan Vee \s 2 a if {21 (i1;T),...,@n(tin;T)} belongs to the sample, 
T’ TT) \0, otherwise. 


(5.5) 


The following relation holds between the empirical copula C and the empirical 
copula density: 


z 2 2 2 
a (ul bn ky teothn 
e(BoB) =D [ep x 
L L ky=1 kyn=1 


Oe eet ee es ee 
«0 (3 ae eewe aa )I. (5.6) 


A natural question arises: what is the estimated value of C(u) or c(w) when 
u does not belong to the lattice defined by the set of points (4, 2, ae, te 
with a, € {1,2,...,7}? It would seem that this is nothing but a straightfor- 
ward interpolation problem, which could be solved by constructing a sim- 
ple staircase function or applying spline functions, for instance. However, 
such methods of interpolation do not ensure that the function so obtained 
fulfills the requirements for a copula, according to Definition 3.2.1; in par- 
ticular, the function must be n-increasing, which requires a multilinear in- 
terpolation scheme. In the bivariate case (for simplicity), given any point 
(u,v) [, Kutt) x [e, Bs yet), where k,, ky € {0,1,...T — 1} denotes the 
integer part of T-u and T- v respectively, the following interpolation 


Cua) <6 (F z) (ky +1—T+u) (ky +1-—T-v) 


+ C (FA) 1-7 wre he) 
c(42 3) (P= hy) hy $1 TP) 
+¢(St St) (P= BVP RY (5.7) 


defines a bona fide empirical copula. Indeed, by construction C is a copula 
(see [370, p. 16]) and C (4,4) =C (4, 4) for all i,j € {1,...,T}. 

Li et al. [304, 305] have provided some other insightful misthods: One of 
them relies on the use of Bernstein polynomials, 
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n 


Vai gers, (5.8) 


t 


Panla) = ( 


Defining 


, J oy y fix ig tin 

Cp(u) = py + Pi, (un) »C (3. po = , (5.9) 
one obtains a copula which converges uniformly to C, almost surely as T’ goes 
to infinity (a weak form of the Stone-Weierstrass theorem).? 

If the method provides a smooth infinitely differentiable copula, it comes 
however, with two severe drawbacks: 


e First, it is easy to show [138] that any differentiable copula in the neigh- 
borhood of (1,1) (or of (0,0)) has a vanishing coefficient of tail dependence 
X (see Chap. 4). Indeed, a necessary condition for A not to vanish is that 
the copula be non-differentiable in the neighborhood of (1,1).* Thus, by 
construction, a nonparametric estimation of copulas using the interpola- 
tion method described above automatically forbids a correct estimation of 
the tail dependence parameter. Such an estimation amounts to project the 
copula onto the set of copulas with vanishing tail dependence. 

e Second, the convergence of the derivatives of Cp toward the derivatives 
of C is not a priori ensured. As a consequence, it is not possible to use 
these estimates to generate simulated data enjoying the same dependence 
structure as that of the sample (see Sect. 3.5). This is particularly harmful 
since one often has to resort to Monte Carlo simulations and bootstrap 
methods to assess portfolio risk or to valuate derivative assets. It is thus 
necessary to look for nonparametric estimators of both the copula and its 
derivatives. 


Kernel Copula Estimator 


Smooth joint estimates of a copula and of its derivatives can be obtained 
by using a kernel-based approach [168]. Still considering an n-dimensional 
random vector X with copula C, let us call its joint distribution function 
F and its marginal distribution functions F; such that F(X.,...,Xn) = 


? According to the Weierstrass approximation theorem, any continuous function 
defined on an interval [a, b] can be uniformly approximated as closely as desired by 
a polynomial function. The Stone- Weierstrass theorem generalizes the Weierstrass 
approximation theorem in two directions by considering an arbitrary compact 
Hausdorff space instead of a compact interval [a,b], and approximations with 
elements from more general sets than polynomials. 

Note that this condition is necessary but not sufficient as shown for instance 
by the Gaussian copula which is not differentiable at (1,1) but nevertheless has 
vanishing tail dependence. 


w 
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C (F, (X1),..., Fn (Xn)). The most commonly used kernel is probably the 
Gaussian kernel, 


xv) = —=e 2” . 5.10 

(0) = (5.10) 

We present the general procedure detailed in [168] on this particular example. 

Let us first estimate the joint distribution of X. Given the sample of size 

T {(4i(1), v2(1), ...,¢n(1)),..-, (v1 (T), v2(T),...,4¢n(T))}, the kernel esti- 
mates of F;(x) and F(a) are 


By 
Fi(x;) = Te (252%) ; (5.11) 
and 
T on 
F(a) = 7 I (52%) (5.12) 
where 
P(x) = [ p(t) dt (5.13) 


and (h1,...,hn) is the bandwidth, a function of T with value in R” and 
satisfying 


A(T) >0, VT,ie€ {1,...,n}, (5.14) 
n n -1 

[[.@+ |r-[[p@} 0, ato. (5.15) 
i=1 t=1 


In practice, one usually chooses h; = 6; - (4/3T)'/°, where 6; denotes the 


sample standard deviation of {x;(1),...,2;(T)}. 


Defining q, the vector whose i*" component is the u;-quantile of re 


6; = inf {x : Fi(x) > us}, uz € (0,1) , (5.16) 


the kernel estimator of the copula C' is simply given by 


C(ur,...,tn) = F(q) - (5.17) 


Under mild regularity conditions, this kernel estimator is asymptotically 
Gaussian. From Proposition 1 in [168], one can show that 


a 1/2 
(7. T*) (Clu) —C(u)) — N (0,C(u)) . (5.18) 
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This asymptotic behavior holds even when the sample is not iid, provided 
that the underlying process satisfies some strong mixing conditions, roughly 
speaking (see [168] for details). Therefore, this method can be applied to 
financial asset returns, which are known to exhibit volatility clustering among 
other time dependence patterns. 

By construction of the kernel estimator, it can be differentiated with re- 
spect to the u,’s. It is thus easy to obtain an estimator of a partial derivative 
of the copula with respect to one (or more) of the variables. For instance, the 
kernel estimator of the first order partial derivative of C' with respect to u; is 


2c0) ew -O:F (q(u 5.19 


where f; denotes the kernel estimate of the marginal density of X;, 


T 
fi(zi) = a “ 2 (A529) , (5.20) 


and 0;F is the partial derivative of F with respect to its ith variable. 
Again, under mild regularity conditions, it can be shown that this estima- 
tor is asymptotically Gaussian, so that 


n 1/2 se 
tr,.) . (acm _ actu) 1 aC(u) 
q 1g ( ee N (05a ou) 2 


Applying the same kind of arguments, one can estimate the higher order 
partial derivatives of the copula C: 


aC(u) _ Cu) Diy ,..vixF (G(u)) 
Oui, Oui, — Oi OU, fay (Gis (ir) «+ Fag, (Gig, (ix) 
where all the i;’s are assumed different and k < n. As a consequence, it 
becomes possible to simulate random variables with copula C, by using the 


algorithm detailed in Sect. 3.5.2. 
When k = n, one obtains the kernel estimator of the copula density: 


aw) = f(a) 


where f denotes the kernel estimate of the joint density of X, 


, (5.22) 


(5.23) 


T on 
fie) = me 1G (A529) . (5.24) 
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Fig. 5.1. Contour plot of the copula density estimated by the kernel method for 
the daily returns of the couple constituted of the German Mark (u variable) and the 
Japanese Yen (v variable) over the time interval from 1 May, 1973 to 2 November, 
2001 (left panel) and for the couple made of General Motors (u variable) and Procter 
& Gamble (v variable) over the time period from 3 July, 1962 to 29 December, 2000 
(right panel). The German Mark data has been reconstructed from the Euro data 
after 31 December, 1999 


Two examples of copula densities estimated by the kernel method are shown 
in Fig. 5.1. Observe that the level curves of the left panel are rather similar to 
those of Fig. 3.3, which depicts the contour plot of a Student copula. This is 
suggestive of the relevance of a Student’s copula with a moderate number of 
degrees of freedom as a possible candidate for modeling dependencies between 
the returns of foreign exchange rates.* For stock returns, the situation is less 
clear, even if one could surmise that a Student copula with a large number of 
degrees of freedom could be a reasonable model. 

To sum up this paragraph on kernel estimators, let us stress that, notwith- 
standing their seeming attractiveness, they have a severe drawback as they 
require a very large amount of data. As an illustration, in order to obtain the 
two pictures of Fig. 5.1, we used between 7,000 and 10,000 data points. With 
less than 2,500-5,000 points, one obtains unreliable estimates in most cases, 
showing that the kernel estimators behave badly for small samples. Therefore, 
with daily returns, an accurate non-parametric estimate of the copula requires 
between 30 and 40 years of data. Over such a long time period, it is far from 
given that the dependence structure remains stationary, thus possibly blowing 
up the whole effort. 


5.1.2 Semiparametric Estimation 


When the number of observations is not large enough and/or when one has 
a sufficiently accurate idea of the true model, it is in general more profitable 


* Of course, this statement should be formally tested by using rigorous statistical 
techniques. See the following sections. 
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to apply a parametric or semiparametric estimation method. By parametric, 
we mean a method based on an entirely parametric model: in such a case, 
we assume that the true model belongs to a given family of multivariate dis- 
tributions, 7.e., a family of copula plus a family of univariate distribution for 
each individual marginal law. Such a modeling approach requires a very ac- 
curate knowledge of the true distribution and can lead to bad estimations of 
the copula parameters if the marginals are misspecified. Thus, when in doubt 
concerning the univariate marginal distributions of the data (see in this re- 
spect the cautionary study presented in Chap. 2), a semiparametric approach 
may be preferable. Indeed, in contrast with fully parametric methods, semi- 
parametric techniques use a parametric representation only for the copula. No 
assumption is made concerning the marginal distributions, which may either 
be estimated nonparametrically or not even come into play at all, as we shall 
see now. 


Estimation Based on Concordance Measures 


Basically, two kinds of semiparametric methods exist. The simplest one is 
based upon the nonparametric estimation of parameters which only depend on 
the copula. Concordance measures, such as Kendall’s tau and Spearman’s rho 
for instance, provide good examples. They can be easily estimated and, once 
a parametric family of copulas has been retained, one just has to express the 
parameters of the copula as functions of these estimated quantities. It is the 
stance taken by Oakes [375] to estimate the parameter @ of a Clayton copula 
(see (3.43)). Table 4.1 gives the following relation between the parameter 6 
and Kendall’s tau: 


27 
= . 2 
1l-—T (5:20) 
Therefore, a natural estimator of 6 is: 
‘ 2 
een ieee (5.26) 
1- TT 


where 7p denotes the sample version of Kendall’s tau, based on the bivariate 
sample of size T: {(11,y1),-.-, (a7, yr)}. Let us recall that 


(5.27) 


where C’ (resp. D) denotes the number of concordant (resp. discordant) pairs, 
i.e., such that (x; — xj) - (y: — yj) > 0 (resp. < 0). 

Based on relation (4.37), a similar approach can be applied to estimate 
the shape parameter p of an elliptical copula. One then obtains 


pr = sin (Ger) (5.28) 
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As an illustration, let us consider again the two samples of the daily returns 
of the German Mark and the Japanese Yen on the one hand, and of General 
Motors and Procter & Gamble on the other hand. Assuming that their copula 
belongs to the class of elliptical copulas, we can apply this method to infer 
the value of the shape parameter p for each pair of assets. For the first one 
(German Mark/Japanese Yen), we obtain: 7p = 0.37 so that pr = 0.54, while 
for the second one: 77 = 0.18 and therefore pr = 0.29. This shows that the 
dependence is stronger between the pair of currencies than between the pair 
of stocks. 

This method is particularly attractive due to its simplicity but is a bit 
naive. While it provides very simple and robust estimators, these estima- 
tors are not always very accurate. This justifies turning to more elaborated 
methods, such as that developed by Genest e¢ al. [197], which relies on the 
maximization of a pseudo likelihood. 


Pseudo Maximum Likelihood Estimation 


Let us still consider a sample of size T {(21(1), v2(1), -..,@n(1)),.--, (ai (ZT), 
xo(T),...,¢n(T))}, drawn from a common distribution F' with copula C and 
margins F;. By definition of the copula, the random vector U whose ith com- 
ponent is given by U; = F;(X;) has a distribution function equal to C. Assum- 
ing that the copula C = C(-; 0°) belongs to the family {O(u1,...,unj0); 0 € 
O Cc R?}, where @ denotes the vector parameterizing the copula, the function 


T 
Inf = $7 Inc(F, (#1(4)),--., Fn (en(é)) 38) , (5.29) 


i=l 


where c(-;@) denotes the density of C(-;@), provides the likelihood of the se- 
quence {(ui(k) = Fi(a1(k)),..., un(k) = Fw (an(k)))}7_,. Note that the se- 
quence {(ui(k) = Fi(xi(k)),..., un(k) = Fw(an(k)))}2_, is independently 
and identically distributed provided that the x;(k)’s are independent and iden- 
tically distributed realizations. 

Since the marginal distributions are generally unknown and when no para- 
metric model seems available, it is reasonable to use the empirical marginal 
distribution functions F; defined by (5.2) to obtain an estimator of U, 


v= (FiQ&),.--,Fa(Xn)) (5.30) 


Then, one derives the pseudo-sample {(é1(k),...,@n(k))}2_,, where @;(k) = 

F’(x;(k)), which is not iid even if the x;(k)’s are. Hence, substituting the 

u(k)’s for the a(k)’s in the log-likelihood function (5.29), one obtains the 

peudo log-likelihood of the model, based on the sample {(x1(1),...,21(T)), 
oy (Bald) cca og(T)) 
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T 
in£= 5 “Inc (A (21 (2)),..., Fy (en(i)) :0) (5.31) 


i=1 
Finally, the parameter vector @ is estimated by maximizing the pseudo log- 
likelihood, so that 


Or = arg max In £ ({a1(i),...,an(i)}; 9). (5.32) 


Under the usual technical regularity conditions ensuring the consistency 
and asymptotic normality of maximum likelihood estimators, Genest et al. 
[197] have shown that @7 is a consistent estimator of 8° and that it is asymp- 
totically Gaussian (see also Appendix 5.B): 


vii (8r — 6°) “4 Nv (0, 5") (5.33) 


with 5? = 1(6°)* + 1(6°) ' Q1 (6°) *, where I (6°) denotes Fisher’s 
information matrix at 0°: 


OcC(U;0) Oc(U;0 
ir (6°)],, =B | a M Se ne (5.34) 


and, with p = dim8, 


P Pp 
2:3 = Cov > Wri(Ur), 9) Weg (Un)| (5.35) 
k=l k=l 
where 
0? In c(u; @) " 
i = lig gs: —— 10") 
Wri(Ur) a (Urs) —B6.5u; — |o-o0 dC (u; 0°) (5.36) 


These results rely on a straightforward application of the consistency and 
asymptotic normality of functionals of multivariate rank statistics derived by 
Ruymgaart et al. [423, 424] and Riischendorf [422]. Since 2 is a positive 
definite matrix, the covariance matrix of the estimator Or is larger® than 
it would be, were the marginal distributions F; perfectly known. Indeed, in 
such a case, the covariance matrix of the estimator would be nothing but the 
inverse of Fisher’s information matrix I (0°)~*. 

As an illustration, let us fit the two samples considered in Sect. 5.1.1, 
namely the daily returns of the FX rate of the German Mark and of the 


> We say that a matrix A is larger than a matrix B if their difference A — B is a 
positive definite matrix. In particular, it implies that the diagonal terms of A are 
larger one to one than the diagonal terms of B. In the present case, this means 
that the variance of each component of the pseudo maximum likelihood estimator 
is larger than the variance of each component of the actual maximum likelihood 
estimator, yielding less accurate estimates of the parameter 00. 
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Fig. 5.2. Contour plot of the Student copula maximizing the pseudo likeli- 
hood (5.31) for the daily returns of the couple German Mark/Japanese Yen over 
the time interval from 1 May, 1973 to 2 November, 2001 (left panel) and for the 
couple General Motors/Procter & Gamble over the time period from 3 July, 1962 
to 29 December, 2000 (right panel) 


Japanese Yen over the time interval from 1 May, 1973 to 2 November, 2001 
and the daily returns of the couple of stocks (General Motors; Procter & 
Gamble) over the time period from 3 July, 1962 to 29 December, 2000. As 
aforementioned, the kernel estimates (see Fig. 5.1) of the copulas of these two 
couples suggest that the Student copula could provide a reasonable description 
of their dependence structure, at least for the pair of currencies. The pseudo 
log-likelihood of these samples for a Student copula with v degrees of freedom 
and shape matrix p can be straightforwardly derived from (3.37) p. 110. No 
closed form for p and Y can be obtained. One has to maximize the pseudo 
log-likelihood with a numerical procedure. Figure 5.2 depicts the contour plot 
of the Student copula maximizing the pseudo likelihood for each sample. For 
the sample (German Mark; Japanese Yen), we obtain the following estimates 
for the shape parameter and the number of degrees of freedom respectively: 
p = 0.54 and » = 5.82. Comparing the left panels of Figs. 5.1 and 5.2, we 
observe that the Student copula estimated from the data seems reasonably 
close to the kernel estimate of the copula, suggesting that the copula model 
is realistic in this case. For the couple (General Motors; Procter & Gamble), 
we find 6 = 0.29 and » = 5.92. However, when comparing the right panels of 
Figs. 5.1 and 5.2, one can observe a clear discrepancy between the two models 
and it is doubtful that the Student copula provides a good representation 
of the dependence in this case. Settling this question requires to qualify the 
goodness-of-fit of the model, which will be discussed in Sect. 5.2. 

In the mean time, let us stress two important points concerning the prac- 
tical implementation of the pseudo maximum likelihood estimation method. 


e It is convenient to replace the empirical distribution function F;(-), defined 


T 
by (5.2), by Pail -). These two quantities are asymptotically equivalent 
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but the use of the latter allows to prevent potential unboundedness of the 
pseudo log-likelihood when one of the &;’s tends to one. 

e Any maximization algorithm requires an initialization. The choice of the 
starting point is not innocuous since the performance of the algorithm 
can depend, for a large part, on it. For any elliptical copula, assessing the 
Kendall’s tau and applying relation (5.28) allows one to obtain a good 
starting point. In fact, the estimation of p;; from 7;; often provides such 
a good starting point that the pseudo maximum likelihood estimate of 
p does not significantly improve on it [350]. Our examples confirm this 
point: with both methods, we have obtained the same values (within their 
confidence interval). In addition, the first estimation method is much faster 
than the second one. These remarks are specially important when one deals 
with large portfolios for which the numerical maximization of the pseudo 
likelihood becomes particularly tricky (and time consuming). Therefore, 
in such a case, the non-parametric estimation of p by relation (5.28) is 
probably the best method. Then, one can obtain an accurate estimate 
of the number vy of degrees of freedom by maximization of the pseudo 
likelihood with respect to this single parameter only, 


Dp = argmax L({x};pp,v) , (5.37) 


where pp denotes the non-parametric estimator of p obtained from (5.28). 


To conclude on the two semiparametric estimation methods that we have 
presented, both methods have their pros and cons. For low dimensional prob- 
lems, the pseudo maximum likelihood estimator is probably the best. Its vari- 
ance is usually lower: Genest et al. [197] report that the variance of this 
estimator is smaller than the variance of the estimator based on Kendall’s tau 
by 10-40% for Clayton’s copula (depending on the value of parameter 0). In 
contrast, when the dimension of the problem is large, the pseudo maximum 
likelihood method becomes time consuming and less efficient. 


5.1.3 Parametric Estimation 


While many procedures exist, we will only focus on maximum likelihood meth- 
ods. Among those, two main approaches can be distinguished: the one-step 
maximum likelihood estimation and the two-step maximum likelihood esti- 
mation. 

Given a multivariate distribution function F(a; 6) depending on the vec- 
tor of parameters 0 € O C R?, which can be represented as F(x;0) = 
C (Fi(21;9),...,Fn(@n;@); 6), and given a sample of size T {(x1(1), x2(1), 
..+,0n(1)),..., (a1 (T), v2(T),...,¢n(T))}, the log-likelihood of the model is 
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In L ({x}; 0) = Sone F,(a1(i);0),..., Fr(an(i); 9); @) 


T 
+ yin fr (w1(i);8) +--+ Ym fn (@n(t)38) » (5.38) 


where, as usual, c(-;@) denotes the density of C(-;0@) and the f;(-;@)’s are 
the densities of the marginal distribution function F;(-;@)’s. The one-step 
maximum likelihood estimator of # is then 


67 = argmax In £({«};6). (5.39) 


Under the usual regularity conditions, it enjoys the properties of consistency 
and asymptotic normality, with its asymptotic covariance matrix given by the 
inverse of Fisher’s information matrix. 

Consider the dependence structure for a sample, supplied by the Insur- 
ance Service Office, of indemnity claims of insurance companies consisting of 
indemnity payment (or loss) and allocated loss adjustment expenses (ALAE). 
Applying the one-step maximum likelihood method leading to (5.39), Frees 
and Valdez [183] and Klugman and Parsa [272] have shown that the depen- 
dence of this sample can be reasonably modeled by Gumbel’s or Frank’s cop- 
ula. We should stress that, for this procedure to work properly, the choice of 
the marginals is crucial. It is thus appropriate to model each marginal distri- 
bution function and perform a first maximum likelihood estimation of their 
corresponding parameters. Then, together with the choice of a suitable copula, 
these preliminary estimates of the parameters of the marginal distributions 
provide useful starting points to globally maximize (5.38) numerically. 

Pushing further this reasoning, consider the situation where one can split 
the parameter vector @ under the form 0 = (a, §1,...,8,) so that 


e., the marginal distributions are functions of independent sets of parame- 
ters. As a consequence, the sera reads 


In £ ({x}; a, B1,...,8n) = Sones F,(21(é); 81), ---, Fu(en(i); Bn); 4) 


Tt nh (x1(%); P1) 


T 
+ dfn (2n(i); Bn) - (5.41) 


Thus, instead of looking for the global maximum 


202 5 Description of Financial Dependences with Copulas 


seats 


(5.42) 
over (@, 31,...,8n), one can perform a two-steps—in fact (n + 1)-steps— 
maximization of the likelihood: 

; T 
B17 = argmax ) (In fi (01(i); 1) (5.43) 
\ 4=1 
; T 
= arg max In fr (%n (2); Bn 5.44 
Byor = aranygx In J (a(n) (5.44) 
L “A 
ar = argmax ) > Inc (A (21();B1,r) fads 
i=1 
ee a (xn(i):By.r) :0, Bi r5-++sBn.r) : (5.45) 
One can prove that the two-step estimator Or = (ar, Bir. daz Bur) is 
consistent and asymptotically Gaussian [248, 372, 492], 
(6r es 0) Law AF (0. ABA") (5.46) 


where A~!BA~"' is the inverse of Godambe’s information matrix,® with 


E [2s.,0, In filgs| 0 0 0 
A= 0 ‘ats 0 0 , 
0 0 E [08,..Bn In Fniao| 0 
E[0g,,alnclgo] --- E[0g,.¢Incleo] E[0a,qlnclgo] 
(5.47) 
and 
B=Cov | (a, Infi lads 20a, In FalB8j Ox Incloo )| (5.48) 


While asymptotically less efficient than the one-step estimator, this approach 
has the obvious advantage of reducing the dimensionality of the problem, 
which is particularly useful when one has to resort to a numerical maximiza- 
tion. 

In practice, one has often to deal with samples of different lengths. This 
may occur for instance when considering simultaneously mature and emerging 


® Godambe’s information matrix has been introduced in the context of inference 
functions (or estimating equations) [205, 354]. 
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markets with different lifespans, or market returns together with the returns of 
a company which has only been recently introduced in the stock exchange or 
which has defaulted, or foreign exchange rates where one of the currencies of 
interest has only a short history, such as the Euro. In such a case, the two-step 
method is much better than the one-step method. The latter requires using a 
data set which is the intersection of all the marginal samples, leading often to 
a significant loss of efficiency in the estimation of the parameters of marginal 
distributions. In contrast, the two-step method uses the whole set of samples 
for the estimation of marginal parameters and restricts to the intersection of 
the marginal samples only for the estimation of the parameters of the copula. 
This two-step estimator is still consistent and asymptotically Gaussian. Its 
asymptotic variance can be derived from (5.47—5.48), by accounting for the 
different lengths of the marginal samples (see Patton [380]). While the one- 
step estimator still remains asymptotically more efficient than the two-step 
estimator, Patton reports that the accuracy of the two-step estimator is much 
better than that of the one-step estimator, when the size of the intersection 
of the marginal samples is small. 


5.1.4 Goodness-of-Fit Tests 


Many different tests have been developed to check the goodness of fit of a 
copula. Basically, the simplest approach uses the property that, under the 
null hypothesis that C(u1,...,Un) is the right copula, the set of random 
variables 


OU Se ie ORO Ui (5.49) 
with 
Ou, 1+ OUR—1Cr (ui, --- , Uk) 
C ee = 5.50 
k (uelea ue—1) Ou, + OUR—1Ck-1 (U1, -.- , Uk—1) 20) 


are identically, uniformly, and independently distributed. This property has 
already been used in Sect. 3.5.2 to provide an algorithm for the generation of 
random variables with a given copula C. Thus, testing the null hypothesis is 
equivalent to testing that the sample of T vectors 


{Ch (tin (t)|@r(t), >>> ,Un—1(t)) »-+ + , C2 (tia(#) | (¢)) vin (t)} i, , (5.51) 


is drawn from a population of uniform random vectors with independent com- 
ponents. Such tests date back to [121, 122]. In the same vein, the more recent 
Bhattacharya-Matusita-Hellinger dependence metric discussed in Chap. 4 can 
also be used [214]. It allows in particular to account for censored data [272], 
which is particularly useful when dealing with insurance data. One can also 
focus on a restricted area of the unit hypercube by using hit tests [381, 380]. 

Other alternatives consist in testing the significance of the distance in L?, 
for some p, between the null copula C' and the estimated copula G 
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but the statistical properties of such tests are rather poor [169]. A simpler 
approach focuses on the discrepancy between the fitted copula and the null 
copula on the main diagonal only by use of the K function: 


C(u) — O(a) |” de , (5.52) 


K(z) = Pr[C (Wi,...,Un) < 2]. (5.53) 


Since K is a univariate distribution function, Kolmogorov or Anderson- 
Darling tests can be applied. 

In fact, this last approach is particularly interesting when one deals with 
Archimedean copulas since kK can be shown to admit the simple closed-form 
expression [35] 


n-1 k . 
K(2) = 24 D(-ypE (a), (5.54) 
k=1 : 


where ¢(-) denotes the generator of the copula and: 


_ OzXK-1(2) 


xel2) = “Hay? With xol2) = [B.0(2)|*- (5.55) 


5.2 Description of Financial Data in Terms 
of Gaussian Copulas 


Section 3.6 has discussed the importance of the Gaussian copula for financial 
modeling. We now review the empirical tests of the hypothesis, denoted Ho, 
that the Gaussian copula is the correct description of the dependence between 
financial assets. After summarizing the testing procedure developed in [334], 
we describe the results. 


5.2.1 Test Statistics and Testing Procedure 


Let us first derive the test statistics which will allow us to reject or not reject 
the null hypothesis Ho. The following proposition, whose proof is given in 
Appendix 5.A, can be stated. 


Proposition 5.2.1. Assuming that the N-dimensional random vector X = 
(X1,...,Xn) with joint distribution function F and marginals F;, satisfies 
the null hypothesis Hp, then, the variable 


2? = S> BF (X,)) (07 )ig MEX) (5.56) 
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where the matrix p is 
pig = Cov[®~* (F;(X;)), 8 * (Fj (X;))] , (5.57) 
follows a x?-distribution with N degrees of freedom. 


The testing procedure based on this result is now described for N = 2 
assets. This case N = 2 is not restrictive as it would appear at first sight 
since, for portfolio analysis and risk management purposes, larger baskets of 
assets should be considered. The testing procedure described here can indeed 
be applied to any number of assets, and it is only for the sake of simplicity of 
the exposition that the presentation is restricted to the bivariate case. 

Let us consider two financial time series of size T: {x,(1),...,1(t),..., 
x1(T)} and {o(1),...,v2(t),...,v2(T)}. We assume that the vectors x(t) = 
(a1 (t), vo(t)), t € {1,..., 7} are independent and identically distributed with 
distribution F’, which implies that the variables x(t) (respectively x2(t)), 
t € {1,...,T}, are also independent and identically distributed, with dis- 
tribution F, (respectively F,). We immediately note that this assumption of 
independently distributed data is not very realistic. It is well-known that daily 
returns are uncorrelated but that their volatility exhibits long-range depen- 
dence. A natural approach would then be to filter the data with an ARCH or 
GARCH process and then apply the testing procedure to the residuals. This 
approach will be discussed in Sect. 5.3.4 and we do not pursue this further 
here. 

The empirical cumulative distribution F; of each variable X; is given by 


T 

P 1 

R(t) = 7 Vlei (5.58) 
k=1 


We use these estimated cumulative distributions to obtain the nearly Gaussian 
variables 4; as 


i(k) = 7} (F(x) se (5.59) 
The sample covariance matrix fp is estimated by the expression 
ee 
b= FDI - HO (5.60) 
i=1 


which allows us to calculate the variable 
2 
2(k) = So Gilk) (OT )ig GK) (5.61) 
i,j=l 
as defined in (5.4.6) for k € {1,...,T}. This variable 2?(k) should be dis- 


tributed according to a x?-distribution if the Gaussian copula hypothesis is 
correct. 
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As recalled in Chap. 2, a standard way for comparing an empirical with a 
theoretical distribution is to measure the distance between these two distrib- 
utions and to perform the Kolmogorov test or the Anderson-Darling test (for 
a better accuracy in the tails of the distribution). The Kolmogorov distance 
is the maximum local distance among all quantiles, which is most often real- 
ized in the bulk of the distribution, while the Anderson-Darling distance puts 
the emphasis on the tails of the two distributions by a suitable normalization 
(which is nothing but the local standard deviation of the fluctuations of the 
distance). These two distances can be complemented by two additional mea- 
sures which are defined as averages of the Kolmogorov distance and of the 
Anderson-Darling distance respectively, 


Kolmogorov: dy = max |F,2(z”) — Fy2(z’)| (5.62) 

average Kolmogorov: dz = i |F.2(2?) — Fy2(2?)| dFy2 (27) (5.63) 
F, 2 2 

Anderson-Darling: d3 = ae Tae) (5.64) 


( 
2 /Fa(2) T= Ft 2) 
|Fi2(27) — Fya(z*)| 
VFra( PT Fate *)| 


The Kolmogorov distance d; and its average dz are more sensitive to the de- 
viations occurring in the bulk of the distributions. In contrast, the Anderson- 
Darling distance dg and its average d4 are more accurate in the tails of the 
distributions. Considering statistical tests for these four distances is important 
in order to be as complete as possible with respect to the different sensitiv- 
ity of the tests. The averaging introduced in the distances dz and d, (which 
are simply the average of d; and d3 respectively) provides important infor- 
mation. Indeed, the distances d, and d3 are mainly controlled by the point 
that maximizes the argument within the max(-) function. They can thus be 
quite sensitive to the presence of an outlier. By averaging, dz and d, become 
less sensitive to outliers, since the weight of such points is only of order 1/T 
(where T is the size of the sample) while it equals one for d, and ds3. 

For the usual Kolmogorov and Anderson-Darling distance, the law of the 
empirical counterpart of d; and dz is known, at least asymptotically. In ad- 
dition, it is free from the underlying distribution. However, for such a result 
to hold, one needs to know the exact value of the covariance matrix p and 
the exact expression of the marginal distribution functions F;. In the present 
case, the variables 27(k), given by (5.61), are only pseudo-observations since 
their assessment requires the preliminary estimation of the covariance matrix 
p and the marginal distribution functions F,. And, as outlined in (200, 201], 
when one considers the empirical process constructed from the pseudo-sample 
{2?(k)}#_,, the limiting behavior is not the same as in the case where one 
would actually observe z?(k), because there are two extra terms: one due to 
the fact that F; is replaced by F, and another one due to the fact that fe) 


average Anderson-Darling: d4 = dFy2(z”) (5.65) 
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is replaced by p. Therefore, one cannot directly use the asymptotic results 
known for these standard statistical tests. 

As a very simple remedy, one can use a bootstrap method [143], whose 
accuracy has been proved to be at least as good as that given by asymptotic 
methods used to derive the theoretical distributions [97]. For the assessment 
of the asymptotic laws of dz and d4, such a numerical study is compulsory 
since, even for the true observations z?(k), one does not know the expression 
of the asymptotic laws. Putting all this together, a possible implementation 
of this testing procedure is the following: 


1. Given the original sample {a(t)}7_,, generate the pseudo-Gaussian vari- 
ables y(t), t € {1,...,T} defined by (5.59). 

2. Then, estimate the covariance matrix p of the pseudo-Gaussian variables 
y, which allows one to compute the variables 2? and then measure the 
distance of its estimated distribution to the .?-distribution. 

3. Given this covariance matrix p, generate numerically a sample of T bi- 
variate Gaussian random vectors with the same covariance matrix p. 

4. For the sample of Gaussian vectors synthetically generated with covari- 
ance matrix p, estimate its sample covariance matrix p and its marginal 
distribution functions F;. 

5. To each of the T vectors of the synthetic Gaussian sample, associate the 
corresponding realization of the random variable z?, called 27(t). 

6. Construct the empirical distribution for the variable Z? and measure the 
distance between this distribution and the ,?-distribution. 

7. Repeat 10,000 times (for instance) the steps 3 to 6, and then obtain an 
accurate estimate of the cumulative distribution of distances between the 
distribution of the synthetic Gaussian variables and the theoretical y?- 
distribution. This cumulative distribution represents the test statistic, 
which will allow you to reject or not the null hypothesis Ho at a given 
significance level. 

8. The significance of the distance obtained at step 2 for the true variables — 
i.e., the probability to observe, at random and under Ho, a distance larger 
than the empirically estimated distance — is finally obtained by a sim- 
ple reading on the complementary cumulative distribution estimated at 
step 7. 


5.2.2 Empirical Results 


Empirical tests implementing the previous procedure have been performed 
on securities, exchange rates, and commodities (metals) in [334]. Focusing on 
securities and exchange rates, we summarize some of the most striking features 
concerning the results obtained in this study. 


208 5 Description of Financial Dependences with Copulas 
Currencies 


The Federal Reserve Board provides access to a large set of historical quotes 
of spot foreign exchange rates. Following [334], let us focus on the Swiss Franc, 
the German Mark, the Japanese Yen, the Malaysian Ringit, the Thai Baht 
and the British Pound during the time interval of ten years from 25 January, 
1989 to 31 December, 1998. All these exchange rates are expressed against 
the US dollar. 

At the 95% significance level, one observes that only 40% (according to d; 
and d3) but 60% (according to dz and d,) of the tested pairs of currencies are 
compatible with the Gaussian copula hypothesis over the entire time interval. 
During the first half-period from 25 January, 1989 to 11 January, 1994, 47% 
(according to ds) and up to about 75% (according to dz and d4) of the tested 
currency pairs are compatible with the assumption of the Gaussian copula, 
while during the second subperiod from 12 January, 1994 to 31 December, 
1998, between 66% (according to d,) and about 75% (according to do, ds 
and d4) of the currency pairs remain compatible with the Gaussian copula 
hypothesis. These results raise several comments both from a statistical and 
from an economic point of view. 

We first have to stress that the most significant rejection of the Gaussian 
copula hypothesis is obtained for the distance dg, which is indeed the most 
sensitive to the events in the tail of the distributions. The test statistics given 
by this distance can indeed be very sensitive to the presence of a single large 
event in the sample, so much so that the Gaussian copula hypothesis can 
be rejected only because of the presence of this single event (outlier). The 
difference between the results given by d3 and d, (the averaged ds) are very 
significant in this respect. The case of the German Mark and the Swiss Franc 
provides a particularly startling example. Indeed, during the time interval 
from 12 January, 1994 to 31 December, 1998, the probability p(d) of non- 
rejection is rather high according to d1, dz and dy, (p(d) > 31%) while it is 
very low according to ds: p(d) = 0.05%, which should lead to the rejection 
of the Gaussian copula hypothesis on the basis of the distance d3 alone. This 
discrepancy between the different distances suggests the presence of an outlier 
in the sample. 

To check this hypothesis, we show in the upper panel of Fig. 5.3 the func- 
tion 
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used in the definition of the Anderson-Darling distance d3 = max, f3(z) (see 
definition (5.64)), expressed in terms of time t rather than z?. The functions 
have been computed over the two time subintervals separately. 

Apart from three extreme peaks occurring on 20 June, 1989, 19 August, 
1991 and 16 September, 1992 within the first time subinterval and one ex- 
treme peak on 10 September, 1997 within the second time subinterval, the 


fa(t) 


(5.66) 
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Fig. 5.3. The upper panel represents the graph of the function f3(¢) defined in 
(5.66) used in the definition of the distance d3 for the couple Swiss Franc/German 
Mark as a function of time t, over the time intervals from 25 January, 1989 to 
11 January, 1994 and from 12 January, 1994 to 31 December, 1998. The two lower 
panels represent the scatter plot of the return of the German Mark versus the return 
of the Swiss Franc during the two previous time periods. The circled dot, in each 
figure, shows the pair of returns responsible for the largest deviation of fs during 
the considered time interval. Reproduced from [332] 


statistical fluctuations measured by f3(t) remain small and of the same or- 
der. Removing the contribution of these outlier events in the determination of 
d3, the new statistical significance derived according to d3 becomes similar to 
that obtained with d,, dz and d, on each subinterval. From the upper panel 
of Fig. 5.3, it is clear that the Anderson-Darling distance d3 is equal to the 
height of the largest peak corresponding to the event on 19 August, 1991 for 
the first period and to the event on 10 September, 1997 for the second period. 
These events are depicted by a circled dot in the two lower panels of Fig. 5.3, 
which represent the return of the German Mark versus the return of the Swiss 
Franc over the two considered time periods. 

The event on 19 August, 1991 is associated with the coup against 
Gorbachev in Moscow: the German mark (respectively the Swiss Franc) lost 
3.37% (respectively 0.74%) against the US dollar. The 3.37% drop of the Ger- 
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man Mark is the largest daily move of this currency against the US dollar over 
the whole first period. On 10 September, 1997, the German Mark appreciated 
by 0.60% against the US dollar while the Swiss Franc lost 0.79%, which repre- 
sents a moderate move for each currency, but a large joint move. This event is 
related to the contradictory announcements of the Swiss National Bank about 
its monetary policy, which put an end to a rally of the Swiss Franc along with 
the German mark against the US dollar. 

Thus, removing the large moves associated with major historical events or 
events associated with unexpected incoming information’ — which cannot be 
accounted for in a statistical study, unless one relies on a stress-test analy- 
sis — we obtain, for d3, significance levels compatible with those obtained with 
the other distances. We can thus conclude that, according to the four dis- 
tances, during the time interval from 12 January, 1994 to 31 December, 1998 
the Gaussian copula hypothesis cannot be rejected for the couple German 
Mark/Swiss Franc. 

From an economic point of view, the impact of regulatory mechanisms 
between currencies or monetary crises can be well identified by the rejection 
or the absence of rejection of the null hypothesis. Indeed, consider the couple 
German Mark/British Pound. During the first half period, their correlation 
coefficient is very high (p = 0.82) and the Gaussian copula hypothesis is 
strongly rejected according to the four distances. On the contrary, during 
the second half period, the correlation coefficient decreases significantly (p = 
0.56) and none of the four distances allows us to reject the null hypothesis. 
Such non-stationarity can be easily explained. Indeed, on 1 January, 1990, 
the British Pound entered the European Monetary System (EMS), so that 
the exchange rate between the German Mark and the British Pound was not 
allowed to fluctuate beyond a margin of 2.25%. However, due to a strong 
speculative attack, the British Pound was devaluated in September 1992 and 
had to leave the EMS. Thus, between January 1990 and September 1992, the 
exchange rate of the German Mark and the British Pound was confined within 
a narrow spread, incompatible with the Gaussian copula description. After 
1992, the British Pound exchange rate floated with respect to the German 
Mark, and the dependence between the two currencies decreased, as shown 
by their correlation coefficient. In this latter regime, one can no more reject 
the Gaussian copula hypothesis. 

The impact of major crises on the copula can also be clearly identified. 
An example is given by the Malaysian Ringit/Thai Baht couple. During the 
period from January 1989 to January 1994, these two currencies have only 
undergone moderate and weakly correlated fluctuations (p = 0.29), so that the 
null hypothesis cannot be rejected at the 95% significance level. In contrast, 
during the period from January 1994 to October 1998, the Gaussian copula 


” Modeling the volatility by a mean reverting stochastic process with long memory 
(the multifractal random walk (MRW)), Sornette et al. [456] have demonstrated 
the outlier nature of the event on 19 August, 1991. 
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hypothesis is strongly rejected. This rejection is obviously due to the persistent 
and dependent (p = 0.44) shocks incurred by the Asian financial and monetary 
markets during the 7 months of the Asian Crisis from July 1997 to January 
1998 [29, 262). 

These two cases show that the Gaussian copula hypothesis can be consid- 
ered reasonable for currencies in the absence of regulatory mechanisms and of 
strong and persistent crises. They also provide an understanding of why the 
results of the test over the entire sample are so much weaker than the results 
obtained for the two subintervals: the time series are strongly nonstationary. 


Stocks 


Let us now turn to the description of the dependence properties of the dis- 
tributions of daily returns for a diversified set of stocks among the largest 
companies quoted on the New York Stock Exchange. We report the results 
presented in [334] concerning Appl. Materials, AT&T, Citigroup, Coca Cola, 
EMC, Exxon-Mobil, Ford, General Electric, General Motors, Hewlett Packard, 
IBM, Intel, MCI WorldCom, Medtronic, Merck, Microsoft, Pfizer, Procter & 
Gamble, SBC Communication, Sun Microsystem, Texas Instruments, and Wal 
Mart. 

The dataset covers the time interval from 8 February, 1991 to 29 December, 
2000. At the 95% significance level, 75% of the pairs of stocks are compatible 
with the Gaussian copula hypothesis. Over the time subinterval from February 
1991 to January 1996, this percentage becomes larger than 99% for dj, do 
and d, while it equals 94% according to d3. Over the time subinterval from 
February 1996 to December 2000, 92% of the pairs of stocks are compatible 
with the Gaussian copula hypothesis according to d;, dz and d, and more 
than 79% according to d3. Therefore, the Gaussian copula assumption is much 
more widely accepted for stocks than it was for the currencies reported above. 
In addition, the nonstationarity observed for currencies does not seem very 
prominent for stocks. 

For the sake of completeness, let us add a word concerning the results of 
the tests performed for five stocks belonging to the computer sector : Hewlett 
Packard, IBM, Intel, Microsoft, and Sun Microsystem. During the first half pe- 
riod (from Feb. 1991 to Jan. 1996), all the pairs of stocks qualify the Gaussian 
copula hypothesis at the 95% significance level. The results are rather differ- 
ent for the second half period (from Feb. 1996 to Dec. 2000) since about 40% 
of the pairs of stocks reject the Gaussian copula hypothesis according to dj, 
dz and d3. This can certainly be ascribed to the existence of a few shocks, 
notably associated with the crash of the “new economy” in March-April 2000 
[450]. However, on the whole, it appears that there is no systematic rejection 
of the Gaussian copula hypothesis for stocks within the same industrial sector, 
notwithstanding the fact that one can expect correlations stronger than the 
average between such stocks. 
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5.3 Limits of the Description in Terms 
of the Gaussian Copula 


5.3.1 Limits of the Tests 


A severe limitation of existing tests applied to Gaussian copulas [334, 350] 
is their inability to clearly distinguish between Gaussian and some relatively 
close alternative models such as the Student’s copulas when these latter cop- 
ulas have a sufficiently large number of degrees of freedom, typically of the 
order of or larger than 10-20. As recalled in Chap. 3, the Student copula be- 
comes very close to the Gaussian copula in its bulk when it has a large number 
of degrees of freedom. In contrast, these two copulas still differ significantly 
in the corners of the unit square (see Figs. 3.2 and 3.3). This difference has 
no serious consequences for “normal” events but leads to important implica- 
tions for extremes. Indeed, as discussed in Sect. 4.5.3, an alternative model to 
the Gaussian copula, such as the Student’s copula, presents a significant tail 
dependence, even for moderately large numbers of degrees of freedom, while 
the Gaussian copula has absolutely no asymptotic tail dependence; these tail 
dependences are controlled mathematically by the behavior of the copulas in 
the corners of the unit square. Therefore, if the tests previously described are 
unable to distinguish between a Student’s and a Gaussian copula, Occam’s 
razor (simplicity and parsimony) suggests choosing the Gaussian copula and, 
aS a consequence, one may underestimate severely the dependence between 
extreme events if the correct description turns out to be the Student’s cop- 
ula. This may have catastrophic consequences in risk assessment and portfolio 
management. 

Figure 4.8 p. 173 provides a quantification of the dangers incurred by 
mistaking a Student copula for a Gaussian one. Consider the case of a Student 
copula with v = 20 degrees of freedom with a correlation coefficient p lower 
than 0.3 ~ 0.4; its tail dependence A,(p) turns out to be less than 0.7%, i.e., 
the probability that one variable becomes extreme knowing that the other 
one is extreme is less than 0.7%. In this case, the Gaussian copula with a zero 
probability of simultaneous extreme events is not a bad approximation of the 
Student’s copula. In contrast, consider a Student copula with a correlation 
p larger than 0.7-0.8, corresponding to a tail dependence larger than 10%, 
which is a nonnegligible probability for simultaneous extreme events. The 
effect of tail dependence becomes of course much stronger as the number v of 
degrees of freedom decreases. 

These examples stress the importance of determining whether the previous 
testing procedure distinguishes between a Student copula with v = 20 (or less) 
degrees of freedom and a given correlation coefficient of the order of p = 0.5, 
for instance, and a Gaussian copula with an appropriate correlation coefficient 
p’. Due to the strikingly different behavior of these two models in the extremes, 
the non-rejection of the Gaussian copula hypothesis previously found for most 
assets can lead to an underestimation of the extreme risks, as a result of the 
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Fig. 5.4. Probability of non-rejection of the Gaussian copula hypothesis when the 
true copula is given by a Student copula with v degrees of freedom and a correlation 
coefficient equal to p (error of type II: “false positive”), when the error of type I 
(“false negative”) of the test is set equal to 5%, for the four distances di—d, 


weak sensitivity of the test in the extreme regions of the copula. It is therefore 
important to discuss the sensitivity of the test presented in Sect. 5.2.1 and to 
review the other alternatives proposed in the literature. 


5.3.2 Sensitivity of the Method 


The previous section has found that the Gaussian copula provides a reasonably 
good model, in the sense that it cannot be rejected by a statistical test at the 
95% significance level. However, could this be due to the lack of power of the 
statistical test rather than to the goodness of the Gaussian copula? 

Let us denote by H,,, the hypothesis that the true copula of the data is 
the Student copula with v degrees of freedom with the correlation coefficient 
p. Considering the alternative hypothesis H,,,, one needs to know what is 
the probability that one cannot reject the null hypothesis Hg when the true 
model is H,,. A complementary information is: what is the minimum p- 
value (significance level) of the test allowing us to reject the Gaussian copula 
hypothesis for instance 95 times out of 100 when the true copula is the Student 
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copula. Answering these questions on the power of the test require a numerical 
study. 

Figure 5.4 shows the minimum p-value of the test, denoted by pos%, as 
a function of the (inverse of the) number of degrees of freedom v and of the 
correlation coefficient p of the true Student copula. Overall, the four tests asso- 
ciated with the four different distances d,—d, behave similarly. As expected, 
for large v, namely v > 10 — 20 (1/v < 0.05 — 0.1), a very high p-value is 
required to reject the Gaussian hypothesis. In such a case, it is almost impos- 
sible to distinguish between the Gaussian hypothesis and a Student copula 
for most realizations. If one leaves out distance d3, the power of the tests 
is almost independent of the value of the correlation coefficient. For ds, the 
power is clearly weaker for the smallest correlations. 

In the light of these results on the performance of the tests, the previous 
conclusion on the relevance of the Gaussian copula for the modeling of the 
dependence between financial risks must be reconsidered. Concerning curren- 
cies, the non-rejection of the Gaussian copula hypothesis does not exclude 
at the 95% significance level that the dependence of the currency pairs may 
be described by a Student copula with adequate values of v and p. For the 
German Mark/Swiss Franc pair, a Student copula with about five degrees of 
freedom was found to obtain the same p-values [334]. For the correlation coef- 
ficient p = 0.92 of the German Mark/Swiss Franc pair, Student’s copula with 
five degrees of freedom predicts a tail dependence coefficient 5 (0.92) = 63%, 
in constrast with a zero value for the Gaussian copula. Such a large value of 
\5 (0.92) implies that, when an extreme event occurs for the German Mark, it 
also occurs for the Swiss Franc with a frequency of 63%. Therefore, a stress 
scenario based on the assumption of a Gaussian copula would fail to account 
for such coupled extreme events, which may represent as many as two-third 
of all extreme events, if it would turn out that the true copula was Student’s 
copula with five degrees of freedom. Note that, with such a large value of the 
correlation coefficient, the tail dependence remains high even if the number 
of degrees of freedom is as large as 20 or more (see Fig. 4.8). 

The Swiss Franc and Malaysian Ringit pair offers a very different case. For 
instance, during the time period from January 1994 to December 1998, the test 
statistics are so high that the description of the dependence with Student’s 
copula would require it to have at least 7-10 degrees of freedom. In addition, 
the correlation coefficient of the two currencies is only p = 0.16, so that, even 
in the most pessimistic situation v = 7, the choice of the Gaussian copula 
would amount to neglecting the tail dependence coefficient »5(0.16) = 4% 
predicted by Student’s copula. In this case, stress scenarios based on the 
Gaussian copula would predict uncoupled extreme events, which would be 
wrong only once in 25 times. 

These two examples highlight the fact that, as much as the number of 
degrees of freedom of Student’s copula which is necessary to describe the 
data, the correlation coefficient remains an important parameter. 
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5.3.3 The Student Copula: An Alternative? 


The tests performed in [83, 350] show that the Student copula can provide 
a significantly better description of the data than the Gaussian copula, par- 
ticularly for foreign exchange (FX) rates, for which the number of degrees of 
freedom of the Student copula is about 5-6 for daily returns. In both cases, 
the testing procedure is based on the pseudo likelihood estimation method 
detailed in Sect. 5.1.2. 

Using the Akaike information criterion defined by the following formula: 


AIC =—-2In£é ({er(i), + an(i) ) 42dim@. (5.67) 


Breymann and his co-authors [83] have shown that the dependence struc- 
ture of the German Mark/Japanese Yen couple is better described by Stu- 
dent’s copula with about six degrees of freedom (for daily returns) than with 
a Gaussian copula, the latter being the second best copula among a set of 
five copulas comprising Clayton’s, Gumbel’s, and Frank’s copulas. This re- 
sult refines those obtained in [334] and is in line with the results obtained 
by non-parametric and semiparametric estimation shown in Figs. 5.1—5.2. In 
addition, Student’s copula is found to provide an even better description when 
one considers FX returns calculated at smaller time scales [83]. Indeed, the 
Student copula seems to provide a reliable model for FX returns calculated 
for time scales larger than 2 hours. The number of degrees of freedom is found 
to increase with the time scale: it increases from 4 at the 2 hours time scale to 
6 at the daily time scale. Such a result is expected since, under time aggrega- 
tion, the distribution of returns should converge to the Gaussian distribution 
according to the central limit theorem, therefore the dependence structure 
of the returns is expected to also converge toward the Gaussian copula at a 
large time scale. At time scales smaller than 2 hours, the study by Breymann 
et al. shows that neither the Gaussian nor the Student copulas are sufficient 
to describe the dependence structure of the distributions of FX returns. At 
these small time scales, microstructural effects probably come into play and 
require more elaborated copulas to model the dependences observed at very 
high frequencies. 

In addition, for all time scales, the copula of the bivariate excess returns 
for high (or low) threshold appears to be best described by Clayton’s (or by 
the survival Clayton) copula. This result can lead us to the following interpre- 
tation on the existence of concomitant extremes. Assume that, conditional on 
a frailty random variable representing the information flow, the assets returns 
are independent. The copula of the returns of the 2 assets then exhibit the 
behavior reported in the study by Breymann et al. [83] if one assumes that 
the random variable representing the information flow has a regularly varying 
distribution — which means that pieces of information with great impact on 
asset returns arrive relatively often. 

In contrast with the case of foreign exchange rates, the estimated number of 
degrees of freedom of the Student copula best fitting the dependence between 
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stocks is rather high, so that the probability of concomitant extreme risks 
remains weak for their usual level of correlation. In the study by Mashal 
and Zeevi [350], the dependence between stocks is claimed to be significantly 
better accounted for by a Student copula with 11-12 degrees of freedom than 
by a Gaussian copula. Their conclusions are drawn from a generalization of 
the log-likelihood ratio (or Wilks) test, based on the fact that the Gaussian 
copula is nothing but a Student copula with an infinite number of degrees of 
freedom. This allows them to compare directly the relevance of the Student 
copula with respect to the Gaussian copula. Indeed, given two nested copulas 
C; and C4, t.e., two copulas such that the space of parameter vectors 0; of 
C{ is a subspace of the space of parameter vectors 02 of C2, then the statistic 


(5.68) 


is asymptotically distributed as a y7 with one degree of freedom if dim 0; = 
dim 62—1, up to a scale factor 1+7 larger than one due to the use of a pseudo 
maximum-likelihood instead of the true maximum likelihood: 


Ap %% (1+-4)x2, as T — 00. (5.69) 


The positive parameter y depends on the choice of the model and can be 
determined by numerical simulations. In more general cases where dim 02 — 
dim 0; =m > 1, Ar does not follows an asymptotic y?,-distribution with m 
degrees of freedom as in standard tests of nested hypotheses. This results from 
the fact that the log-likelihood ratio statistic does not converge to a x? distri- 
bution when the model is misspecified, which is the relevant situation when 
using the pseudo likelihood instead of the true likelihood (see Appendix 5.B). 
In such a case, the Wald or Lagrange multiplier tests are more appropriate 
(209, 372]. 

While these results improve somewhat on the initial study [334], in con- 
trast with the case of currencies, one can question the existence of a real 
improvement brought by the Student copula to describe the dependence be- 
tween stocks. Indeed, correlation coefficients between two stocks are hardly 
greater than 0.4—0.5, so that the tail dependence of a Student copula with 
11-12 degrees of freedom is about 2.5% or less. In view of all the differ- 
ent sources of uncertainty during the estimation process in addition to the 
possible non-stationarity of the data, one can doubt that such a description 
eventually leads to concrete improvements for practical purposes. To highlight 
this point, let us consider several portfolios made of 50% of the Standard & 
Poor’s 500 index and 50% of one stock (whose name is indicated in the first 
column of Table 5.1). Let us then estimate the probability P, that this portfo- 
lio incurs a loss larger than n times its standard deviation (n = 2,...,5). For 
the same portfolio, let us estimate the probability P, (resp. P,) that it incurs 
the same loss ( i.e., n times its standard deviation) when the dependence be- 
tween the index and the stock is given by a Gaussian copula (resp. a Student 


5.3 Limits of the Description in Terms of the Gaussian Copula 217 


copula with ten degrees of freedom). The row named P,/Pg/s gives the aver- 
age values of P,/P, and P,/P, over the 20 portfolios. For shocks of two- and 
three-standard deviations, the values of P,./P, close to 1 indicate that the 
dependence structure is correctly captured by a Gaussian copula. For shocks 
of four- and five-standard deviations, P,/P, becomes larger than 1, showing 
that large shocks are more probable than predicted by the Gaussian depen- 
dence, and all the more so, the larger the amplitude of the shocks. This occurs 
notwithstanding the use of marginals with heavy tails, suggesting the effect of 
a non-zero tail dependence in the true data. In contrast, the values of P,./P., 
are significantly smaller than 1 showing that the Student copula overestimates 
the frequency of large shocks. In addition, this overestimation is surprisingly 
worse for larger shocks (by as much as a Factor 2.5) in the range in which the 
Gaussian copula becomes less adequate. This suggests that the tail depen- 
dence of the Student copula is too large to describe this data set. This simple 
exercise illustrates that neither the Gaussian copula nor a Student copula with 
a reasonable number of degrees of freedom provide an accurate description of 
the dependence between stock returns.® The discrepancies between these two 
models and the real dependence structure becomes all the more important, 
the more extreme is the amplitude of the shock. And in fact, the situation is 
worse for the Student copula. This suggests that, for practical applications, 
Student’s copula may not provide a real improvement with respect to the 
Gaussian copula for traditional portfolio management. 


5.3.4 Accounting for Heteroscedasticity 


The aforementioned studies have not taken into account, or only partially, 
the well-known volatility clustering phenomenon, which certainly impacts 
on the dependence properties of assets returns. This issue has been addressed 
by Patton [380], who has shown that the two-step maximum likelihood es- 
timation can be extended to conditional copulas to account for the time- 
varying nature of financial time series. Filtering marginal data by a GARCH 
process, Patton has shown that the conditional dependence structure between 
exchange rates (Japanese Yen against Euro) is better described by Clayton’s 
copula than by the Gaussian copula. We also note that Muzy et al. [366] have 
constructed a multivariate “multifractal” process to account for both volatil- 
ity clustering and the dependence between assets. In this case, the conditional 
copula is (nearly) Gaussian. 

The main limitation of Patton’s approach comes from the fact that fil- 
tering the data does not leave the dependence structure, 7.e., the copula, 
unchanged. Thus, the copula of the residuals is not the same as the cop- 
ula of the raw returns. Moreover, the copula of the residuals changes with 


8 This point confirms the doubts raised by the comparison of the nonparametric 
and the semiparametric estimates of the density of the copula of the daily returns 
of General Motors and Procter & Gamble, represented in Figs. 5.1—5.2. 


Table 5.1. Portfolios made of 50% of the Standard & Poors 500 index and 50% of one stock (whose name is indicated in the first 
column) are considered 


100 x Pr[R < —n-o] 


n=2 n=3 n=4 n=5 
P, Py Ps P, Py Ps P, Py Ps P, Py Ps 
Abbott Labs 2.07 2.06 2.62 0.58 0.51 0.84 0.21 0.15 0.39 0.09 0.07 0.26 
American Home Products Corp. 1.98 2.07 2.72 0.51 0.56 0.98 0.30 0.24 0.43 0.17 0.13 0.22 
Boeing Co. 2.03 1.96 2.50 0.53 0.51 0.95 0.21 0.18 0.44 0.13 0.09 0.19 
Bristol-Myers Squibb Co. 156 1.81 2.33 0.55 0.48 0.98 0.26 0.22 0.81 0.11 O01 0.42 
Chevron Corp. 1.94 1.99 2.26 0.40 0.42 0.88 0.13 0.15 0.55 0.08 0.07 0.30 
Du Pont (E.I.) de Nemours & Co. 2.13 2.02 2.59 0.51 0.47 0.87 0.21 0.19 0.58 0.09 0.07 0.32 
Disney (Walt) Co. 1.83 1.87 2.40 0.47 0.53 1.28 0.24 0.22 0.73 0.15 0.12 0.43 
General Motors Corp. 1.73 1.95 2.12 0.45 0.42 0.76 0.21 0.13 0.59 0.08 0.06 0.36 
Hewlett-Packard Co. 1.77 2.08 2.54 0.53 0.51 0.99 0.21 0.19 0.44 0.08 0.09 0.15 
Coca-Cola Co. 1.60 1.83 2.18 0.45 O56 0.77 0.19 0.18 0.58 0.09 0.07 0.46 
Minnesota Mining & MFG Co. 1.85 2.01 2.23 0.57 0.49 0.80 0.19 0.19 0.60 0.08 0.09 0.52 
Philip Morris Cos Inc. 2.00 2.07 2.33 0.45 O46 1.10 0.21 0.19 0.65 0.13 0.12 0.34 
Pepsico Inc. 1.92 2.08 2.50 0.51 0.49 0.83 0.15 0.18 0.39 0.15 0.07 0.22 
Procter & Gamble Co. 1.51 1.67 2.05 0.45 0.48 0.95 0.24 0.21 0.82 0.13 0.09 0.67 
Pharmacia Corp. 181 1.94 2.69 0.53 0.54 1.06 0.23 0.25 0.80 0.11 0.12 0.45 
Schering-Plough Corp. 1.85 1.94 2.01 0.49 0.44 0.73 0.11 0.14 0.58 0.08 0.06 0.31 
Texaco Inc. 190 1.94 2.77 0.55 0.55 1.01 0.28 0.23 0.41 0.11 0.11 0.21 
Texas Instruments Inc. 1.87 2.02 2.09 0.49 O56 0.89 0.21 0.15 0.66 0.06 0.07 0.16 
United Technologies Corp 2.17 2.1 2.28 0.47 0.45 0.78 0.17 0.14 0.47 0.11 0.06 0.30 
Walgreen Co. 1.81 1.96 2.28 0.47 0.41 0.92 0.23 0.14 0.40 0.09 0.08 0.21 
P,/Pyg/s 0.95 0.79 1.02 0.55 1.15 0.39 1.24 0.38 
We estimate the probability P, that each portfolio incurs a loss larger than n times its standard deviation (n = 2,...,5). For each 


portfolio, we also estimate the probability P, (resp. P;) that it incurs the same loss (i.e., n times its standard deviation) when the 
dependence between the index and the stock is given by a Gaussian copula (resp. a Student copula with ten degrees of freedom). 
The row named P,/P,/s; gives the average values of P,/P, and P,/Ps over the 20 portfolios. 
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the chosen filter. Residuals are not the same when one filters the data with 
an ARCH, a GARCH or a Multifractal Random Walk. In addition, for an 
arbitrage-free market, the (multivariate) log-price process can be expressed 
as a time changed multivariate Brownian motion? [264], so that conditional 
on the (realized) volatility [8, 38], the log-price process is nothing but a mul- 
tivariate Brownian motion. As a consequence, conditional on the volatility, 
the multivariate distribution of returns should be Gaussian, and, therefore, 
the copula of conditional returns should also be the Gaussian copula. Thus, 
the estimation of the conditional copula does not really bring new insights. In 
fine, the discrepancy between the Gaussian copula and the conditional copula 
provided by some other model mainly highlights the weakness of the model 
under consideration. This raises the question whether performing a model-free 
analysis (without any pre-filtering process) is not a more satisfying alterna- 
tive. Obviously, the price to pay for such a model-free approach is a weakening 
of the power of the statistical test due to the presence of (temporal) depen- 
dence between data. There is no free lunch, neither on financial markets, nor 
in statistics. 


5.4 Summary 


The Gaussian paradigm has had a long life in finance. While it is now clear that 
marginal distributions cannot be described by Gaussian laws, especially in 
their tails (see Chap. 2), the dependence structure between two or more assets 
is much less known and nothing suggests to reject a priori the Gaussian copula 
as a correct description of the observed dependence structure. In addition, 
the Gaussian copula can be derived in a very natural way from a principle of 
maximum entropy [265, 453].1° The Gaussian copula has also the advantage 
of being the simplest possible one in the class of elliptical copulas, since it 
is entirely specified by the knowledge of the correlation coefficients while, for 
instance, Student’s copula requires in addition the specification of the number 
of degrees of freedom. This has led to taking the Gaussian copula as a logical 
starting point for the study of the dependence structure between financial 
assets. 

However, as recalled in Chap. 3, if the Gaussian and Student copulas are 
very similar in their bulk, they become significantly different in their tails. 


° More precisely, in an arbitrage-free market, any n-dimensional square-integrable 
log-price process In p(t), with continuous sample path, satisfies 


r-(t) = Inp(t +7) — Inp(t) = a p(s) ds + [- a(s) dW(s) , 


where ys is a predictable n-dimensional vector and o is an n-by-n matrix. W 
denotes an n-dimensional standard Brownian motion. 

10 For other examples of the determination of distributions using the principle of 
maximum entropy, see [410]. 
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Concretely, the essential difference between the Gaussian and Student copu- 
las is that the former has independent extremes (in the sense of the asymptotic 
tail dependence; see Chap. 4), while the latter generates concomitant extremes 
with a non-zero probability which is all the larger, the smaller is the number 
of degrees of freedom and the larger is the correlation coefficient. Thus, by 
providing a slight departure from the Gaussian copula in the bulk of the dis- 
tributions, Student’s copula could also be a good candidate to model financial 
dependencies. It turns out that it is indeed a good model for foreign exchange 
rates. The situation is not so clear of stock returns, as Student’s copula does 
not seem to perform significantly better than the Gaussian copula, both be- 
ing apparently approximations of the true copula. From a practical point of 
view, there have been several efforts to find better copulas, but the obtained 
gains are not clear. From an economic point of view, the reasons explaining 
the difference between the dependence structure of the FX rate and the stock 
returns remain to be found. The differences between stock markets and FX 
markets organizations can be seen as an obvious reason, but direct links be- 
tween markets organization and returns distribution or copula have not yet 
been clearly articulated. 

One of the motivations in introducing the tail dependence coefficient A is 
to quantify the potential risks incurred in modeling the dependence structure 
between assets with Gaussian copulas, for which = 0. Indeed, for assets with 
large correlation coefficients, it may be dangerous to use Gaussian copulas as 
long as one does not have a better idea of the value of the tail dependence 
coefficient. Parametric models do not provide readily this information since 
they fix the tail dependence coefficient and therefore do not provide an inde- 
pendent test of whether \ is small (and undistinguishable from 0) or large. 
To get further insight, nonparametric methods could thus be useful. 

Nonparametric models have the advantage of being much more general 
since, by construction, they do not assume a specific copula and might thus 
allow for an independent determination of the tail dependence coefficient. 
Some of these methods have the advantage of leading to estimated copulas 
which are smooth and differentiable everywhere, which is convenient for the 
generation of random variables having the estimated copula, for sensitivity 
analysis and for the generation of synthetic scenarios [149]. However, this 
advantage comes with the main drawback that the tail dependence coefficient 
vanishes by construction. In sum, all methods mentioned until now suffer from 
the same problem of neglecting concomitant extremes. It thus seems that the 
use of copulas is not the easiest path to calibrate extreme events. We address 
this problem in the next chapter, in particular by describing direct methods 
for estimating extreme concomitant events. 
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5.A Proof of the Existence of a x?-Statistic 
for Testing Gaussian Copulas 


To prove proposition 5.2.1, we first consider an n-dimensional random vector 
X = (Xj,...,X,). Let us denote by F its distribution function and by F; 
the marginal distribution of each X;. Let us now assume that the distribution 
function F' satisfies Ho, so that F' has a Gaussian copula with correlation ma- 
trix p while the F;’s can be any distribution functions. According to Theorem 
3.2.1, the distribution F' can be represented as : 


Figic cts) =Ca ho (Alen) oO nla) (5.A.1) 
Let us now transform the X;,’s into Normal random variables Y;’s: 
Y; = @'(F,(X;)) . (5.A.2) 


Since the mapping ~'(F;(-)) is increasing, the invariance Theorem 3.2.2 al- 
lows us to conclude that the copula of the variables Y;’s is identical to the 
copula of the variables X;’s. Therefore, the variables Y;’s have Normal mar- 
ginal distributions and a Gaussian copula with correlation matrix p. Thus, by 
definition, the multivariate distribution of the Y;’s is the multivariate Gaussian 
distribution with correlation matrix p: 


G(y) = 5,,(@-1(F,(a1)),.-., 1 (Fu(en))) (5.A.3) 
= Pon(yi, aay Yn) ’ (5.4.4) 
and Y is a Gaussian random vector. From (5.A.3-5.A.4), we have 
pig = Cov[® | (F,(X;)), @ | (F;(X;))] - (5.A.5) 
Consider now the random variable 
BP=Y'e Y= VY (Oy YH, (5.4.6) 
i,j=l 


where - denotes the transpose operator. It is well known that the variable 
Z? follows a x?-distribution with n degrees of freedom. Indeed, since Y is 
a Gaussian random vector with covariance matrix!! p, it follows that the 
components of the vector 


Y=AY, (5.A.7) 


are independent Normal random variables. Here, A denotes the square root 
of the matrix p~!, obtained by the Cholevsky decomposition, so that A’ A = 
p '. Thus, the sum Y‘Y = Z? is the sum of the squares of n independent 
Normal random variables, which follows a \?-distribution with n degrees of 
freedom. 

"! Up to now, the matrix p was named correlation matric. But in fact, since the 


variables Y;’s have unit variance, their correlation matrix is also their covariance 
matrix. 
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5.B Hypothesis Testing with Pseudo Likelihood 


Let us consider the iid sample {(21(1),72(1),...,an(1)),..-, (a1 (TL), v2(T), 
...,%n(T))} drawn from the n-dimensional distribution F' with copula C and 
margins F;. We aim at estimating the unknown copula C' by use of the semi- 
parametric method presented in Sect. 5.1.2. Its pseudo likelihood reads 


T 
In£Lp = S “Inc (tiu(i),...,tin(4);) , (5.B.8) 


i=l 


with ti,(i) = Fy, (ap(é)), where the F,’s are the empirical estimates of the 
marginal distribution functions F;’s, and c(-;@) denotes the copula density 
Co, 9€ O CR’. The parameter vector 0 can be estimated by maximization 
of this pseudo log-likelihood, so that 


Or = arg max In £ ({ti1(i),..., tin(i)}; 8) . (5.B.9) 


Under usual regularity conditions, it can be shown that Or is a consistent 
estimator of 9°, which is asymptotically Gaussian [197], 


VT (6r— 6°) “8.V (0,5) (5.B.10) 


with ©? = 1 (6°) + 1 (6°) ' QI (6°) ', where I (0°) represents Fisher’s 
information matrix at 0°, 


OcC(U;0) Oc(U;0 
(6")],,=5| < ) " ble 


(5.B.11) 


and U denotes an n-dimensional random vector with distribution function C 
and with 


dim @ dim @ 
ij = Cov bs Wri(Uk), > Wag (Ue) | (5.B.12) 
k=1 k=1 
where 
6? Inc (u; 4) 
Wrei(Uz) = 1 os dC (u; 0° 
nie) (ree Ce 80,0 || gp ne") 
(5.B.13) 


These results come from a straightforward application of the consistency and 
asymptotic normality of functionals of multivariate rank statistics derived 
by Ruymgaart et al. [423, 424] and Riischendorf [422]. Indeed, concerning 


asymptotic normality, the maximum pseudo likelihood estimator Or satisfies: 
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hr (6r) Ki, (5.B.14) 
where 
1 T 
hr(@) = 7d, Volne(a(i); 6). (5.B.15) 


Now, expanding hy around 0°, we have 
hr(@) = hr(0°) + Ar(0°) (0 — 6°) +--- (5.B.16) 


where A7(@) is the Hessian matrix of h(8), 


1 T 


(4r(0)) = 5 > Go, ne (alk); 8) . (5.B.17) 
k=1 


Proposition A.1 in [197] provides a generalized form of the law of large num- 
bers for functionals of rank statistics, so that 


Ar(0) “3E [2% 0, Inc(U; @)| =-1 (0°) , (5.B.18) 


@=6°9 


where I (8°) denotes Fisher’s information matrix (5.B.11). Evaluating (5.B.16) 
at = Or, one finally obtains 


VT - hr(0°) = VT- 1 (0°) (4r - 6") +op(1)5 (5.B.19) 


as usual. 
Proposition A.1 in [197] also states a generalized form of the central limit 
theorem for functionals of rank statistics, which allows one to write 


VT - hr(0°) — N (0,0 (8°)) , (5.B.20) 


where I’ (8°) = I (6°) + Q. Then, (5.B.19-5.B.20) allow us to conclude that 
VT: (Or — 6°) +N (0,5) , (5.B.21) 


where © stands for I (6°) +1(0°)' Q1 (0°). 

Since 2 is a positive definite matrix, the variance of the estimator Or is 
larger than it would be, were the marginal distributions F; perfectly known. 
Indeed, in such a case, the variance of the estimator would be nothing but the 


inverse of Fisher’s information matrix I (0°) ~. 
Now, let us write the vector 0 of parameters as follows: 


6= o) (5.B.22) 
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with dim 0, = d and dim 02 = p—d. We would like to test the null hypothesis 
according to which 0, = 69, i.e, Ho = {9 € 0,0; = O09}. In Mashal and 
Zeevi’s approach [350], this amounts to test Hp = { (v, ») v= oo}, where 
v denotes the number of degrees of freedom of the Student’s copula and ¥? 
its shape matrix. 

If the likelihood £ was the actual likelihood, and not a pseudo likelihood, 
the log-likelihood ratio test would allow us to test such a null hypothesis. 
Indeed, under the null, Wilks’ theorem would hold and one would have 


p= 2 [én (6r) — sup £10) — x2, (5.B.23) 
€Ho 


where x2 denotes the x? distribution with d degrees of freedom (see Chap. 2, 
Sect. 2.4.4). 

Unfortunately, this test does not apply with the pseudo likelihood, as pre- 
viously assumed [209, 372, 491]. Actually, expanding the pseudo log-likelihood 
(5.B.8) around @° and accounting for (5.B.19), we obtain 


Lp (87) = Lr (6°) + 5 (6r - 0°). 1 (6°) (81 — 6°) + op (1). 
(5.B.24) 


0 
Denoting by 67 the pseudo maximum likelihood estimator under the null 
hypothesis (i.e., assuming 0; = 09): 


T 
6, = arg max eee) 0) , (5.B.25) 


and expanding hp = T~!VoLr around @°, which yields 


Fa (Ter Or) — vane (on—ve0% (0° gp) tnt 


VT 0 0,7 = 05 
(5.B.26) 
the expansion of the pseudo likelihood around 6° reads 
~ 0 ~ T x0 t .0 
Lr (Or) = Lr (6°) + 5 (Or -— 6°) 1(9°) (87 — 6°) + op(1) . 
(5.B.27) 
The notation 
se 09 
é-=|(.0 |, (5.B.28) 
92.7 


has been used in (5.B.26). 
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Ar, defined in (5.B.23), is now obtained by taking the difference between 
(5.B.24) and (5.B.27): 


Ap =T (6r o°) I (6°) (@r — 6°) — 7 (8p — 0°) (0°) (67 — 6°) + op(1), 
=1(6r—@r) I (6°) (6r—-6;,) +27 (0,-0°) 1 (6°) (@r—67,) + op(1) , 


where the last equality uses the fact that each term is a scalar and is thus 


equal to its transpose. 
Substituting (5.B.19) in (5.B.26) yields 


a ae (67) = VT -1(6°) (6r 2 é,) + op (1) (5.B.29) 


t 
and, left-multiplying by VT. (67 a 0°) =VT-| .0 : 9 | shows that 
92.7 — 9» 
T (6; & 0°) 1 (0°) (6r S 6) = 0,(1) , (5.B.30) 
which allows us to conclude that 
» x t mn n 
Meee (6 fe 6) 1 (6°) (6 = 6) enone (5.B.31) 
Now, since (5.B.29) is equivalent to 
VT. (6 - 6;) rs Vo,Lr (67) +op(1), (5.B.32) 
tor) = VE . Op(1), (5.B. 
we have: 
~ 0 t ~ a0 
Ap ; ae () 1(0°) ' ee (7) +0,(1) , (5.B.33) 
~ 7.0\t x px 
=T7 Vor (Or) [I-14] ,, Vorbr (Or) + op(1), (5.B.34) 


where [I aa 1, denotes the p x p submatrix of the p first rows and columns of 
the inverse of I(0°). 
From (5.B.29) again, we have 
1 


a (ra vale (6;) = VT (@..7 = 6!) +65 (1) 5 (5.B.35) 


so that 
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4 x (r0\b 4 ain fini) 
Ag =T-*-Vo,Lr (Or) [I~"],, VoEr (6,.) 
_ t a1 (3 
=T-(6,r- 69) {(F-4,,}* (Gr - 02) + op(1). (6.8.36) 
Now, by (5.B.21), we have 
VT: (61.7 - 02) —N (0, [E"],,)- (5.B.37) 
Denoting by B a symmetric positive definite matrix such that 
_ fy 
B.B=({z"],, (5.B.38) 
and by €, a d-dimensional standard Gaussian vector, we obtain: 
_ = 
Ar = £4 -B { [I ‘| ain B. Ea + 0,(1) : (5.B.39) 
As a consequence, 
Ar +> x3, as T — 00 (5.B.40) 
unless 
7 -1 
Bie at BH Ides (5.B.41) 


which holds when (2 = 0, for instance. Therefore, when one resorts to the 
pseudo likelihood instead of the actual likelihood, the asymptotic distribu- 
tion of A, is not a simple xy? distribution and the log-likelihood ratio test 
becomes impracticable. In the particular case where dim 6, = 1, as in [350], 
B{ Pee Pee oe B is a scalar so that A, follows a x? distribution with one 
degree of freedom, up the scale factor. 
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Measuring Extreme Dependences 


In this chapter, we investigate the relative information content of several mea- 
sures of dependence between two random variables X and Y in various models 
of financial series. We consider measures of dependence especially defined for 
large and extreme events. These measures of dependence are of two types: (i) 
unconditional such as with the coefficient of tail dependence already intro- 
duced in Chap. 4 and (ii) conditional such as with the correlation coefficient 
conditional over a given threshold. The introduction of conditioning over val- 
ues of one or both variables reaching above some threshold is a natural ap- 
proach to discriminate the dependence in the tails. It explodes the concept 
of dependence into a multidimensional set of measures, each adapted to cer- 
tain ranges spanned by the random variables. We present explicit analytical 
formulas as well as numerical and empirical estimations for these measures of 
dependence. The main overall insight is that conditional measures of depen- 
dence may be very different from the unconditional ones and can often lead 
to paradoxical interpretations, whose origins are explained in detail. 

When the dependence properties are studied as a function of time, one can 
often observe that conditional measures vary with time. Such time variation 
has initiated a vigorous discussion in the literature on its possible economic 
meaning. We review the mechanism by which conditioning provides a straight- 
forward and general mechanism for explaining changes of correlations based 
on changes of volatility or of trends: for a given conditional threshold, if the 
volatility of one or both time series changes in some time interval, then the 
corresponding quantiles sampled in the conditional measure will also change; 
as a result, the conditional measure will not sample the same part of the tails 
of the distributions, effectively changing the definition of the conditional mea- 
sure. In this explanation, the variation with time of conditional measures of 
dependence results solely from a change of volatility but does not reflect a gen- 
uine change of dependence. In other words, a constant dependence structure 
together with time-varying volatility may give rise to changing conditional 
measures of dependence, which would be incorrectly interpreted as reflecting 
genuine changes of dependence. Thus, tools based upon conditional quantities 
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should be used with caution since conditioning alone induces a change in the 
dependence structure which has nothing to do with a genuine change of un- 
conditional dependence. In this respect, for its stability, the coefficient of tail 
dependence should be preferred to the conditional correlations. Moreover, 
the various measures of dependence exhibit different and sometimes opposite 
behaviors, showing that extreme dependence properties possess a multidimen- 
sional character that can be revealed in various ways. 

As an illustration, the theoretical results and their interpretation presented 
below are applied to the controversial contagion problem across Latin Amer- 
ican markets during the turmoil periods associated with the Mexican crisis 
in 1994 and with the Argentinean crisis that started in 2001. The analysis of 
several measures of dependence between the Argentinean, Brazilian, Chilean 
and Mexican markets shows that the above conditioning effect does not fully 
explain the behavior of the Latin American stock indexes, confirming the ex- 
istence of a possible genuine contagion. Our analysis below suggests that the 
1994 Mexican crisis has spread over to Argentina and Brazil through conta- 
gion mechanisms and to Chile only through co-movements. Concerning the 
recent Argentinean crisis that started in 2001, no evidence of contagion to the 
other Latin American countries (except perhaps in the direction of Brazil) 
can be found but significant co-movements are identified. 

The chapter is organized as follows. Sect. 6.1 motivates the whole chap- 
ter by presenting a number of historically important cases which suggested 
to previous authors that, “during major market events, correlations change 
dramatically” [71]. This section then offers a review of the different existing 
view points on conditional dependences. 

Section 6.2 describes three conditional correlation coefficients: 


the correlation pf conditioned on signed exceedance of one variable, 

or on both variables (p,,) and 

the correlation p% conditioned on the exceedance of the absolute value of 
one variable (amounting to a conditioning on large values of the volatility). 


Boyer et al. [78] have provided the general expression of p{ and p% for the 
Gaussian bivariate model, which we use to derive their v dependence for large 
thresholds v. This analysis shows that, for a given distribution, the condi- 
tional correlation coefficient changes even if the unconditional correlation is 
left unchanged, and the nature of this change depends on the conditioning set. 
We then give the general expression of p and p% for the Student’s bivariate 
model with v degrees of freedom and for the factor model X = GY +, for 
arbitrary distributions of Y and «. By comparison with the Gaussian model, 
these expressions exemplify that, for a fixed conditioning set, the behavior of 
the conditional correlation change dramatically from one distribution to an- 
other one. Conditioning on both variables, we give the asymptotic dependence 
of pu for the bivariate Gaussian model and show that it essentially behaves 
like pf. Applying these results to the Latin American stock indexes, we find 
that one cannot entirely explain the behavior of the conditional correlation 
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coefficient for these markets by the conditioning effect, suggesting the exis- 
tence of a possible genuine contagion as mentioned above. 

In Sect. 6.3, to account for several deficiencies of the correlation coefficient, 
we study an alternative measure of dependence, the conditional rank corre- 
lation (Spearman’s rho) which, in its unconditional form, is related to the 
probability of concordance and discordance of several events drawn from the 
same probability distribution, as recalled in Chap. 4. This measure provides 
an important improvement with respect to the correlation coefficient since it 
only takes into account the dependence structure of the variable and is not 
sensitive to the marginal behavior of each variable. Numerical computations 
allow us to derive the behavior of the conditional Spearman’s rho, denoted 
by ps(v). This allow us to prove that there is no direct relation between the 
Spearman’s rho conditioned on large values and the correlation coefficient 
conditioned on the same values. Therefore, each of these coefficients quanti- 
fies a different kind of extreme dependence. Then, calibrating the models on 
the Latin American market data confirms that the conditional effect cannot 
fully explain the observed dependence and that contagion can therefore be in- 
voked. These results are much clearer for the conditional Spearman’s rho than 
for the condition (linear) correlation coefficient, due to the greater impact of 
large statistical fluctuations in the later. 

Section 6.4 discusses the tail-dependence parameters and A, introduced 
in Chap. 4. Applying the procedure of [390], we estimate nonparametrically 
the tail dependence coefficients. We find them significant and thus conclude 
that, with or without contagion mechanism, extreme co-movements must nat- 
urally occur on the various Latin American markets as soon as one of them 
undergoes a crisis. 

Section 6.5 provides a comparison between these different results and a 
synthesis. A first important message is that there is no unique measure of 
extreme dependence. Each of the coefficients of extreme dependence that we 
have presented provides a specific quantification that is sensitive to a certain 
combination of the marginals and of the copula of the two random variables. 
Similarly to risks whose adequate characterization requires an extension be- 
yond the restricted one-dimensional measure in terms of the variance (volatil- 
ity) to include the knowledge of the full distribution, tail-dependence has 
also a multidimensional character. A second important message is that the 
increase of some of the conditional coefficients of extreme dependence when 
weighting more and more the extreme tail range does not necessarily signal 
a genuine increase of the unconditional correlation or dependence between 
the two variables. The calculations presented here firmly confirm that this 
increase is a general and unavoidable result of the statistical properties of 
many multivariate models of dependence. From the standpoint of the con- 
tagion across Latin American markets, the theoretical and empirical results 
suggest an asymmetric contagion phenomenon from Chile and Mexico towards 
Argentina and Brazil: large moves of the Chilean and Mexican markets tend 
to propagate to Argentina and Brazil through contagion mechanisms, 7.e., 
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with a change in the dependence structure, while the converse does not hold. 
As a consequence, this seems to prove that the 1994 Mexican crisis had spread 
over to Argentina and Brazil through contagion mechanisms and to Chile only 
through co-movements. Concerning the more recent Argentinean crisis start- 
ing in 2001, no evidence of contagion to the other Latin American countries 
is found (except perhaps in the direction of Brazil) and only co-movements 
can be identified. 


6.1 Motivations 


6.1.1 Suggestive Historical Examples 


The 19 October, 1987, stock-market crash stunned Wall Street professionals, 
hacked about $1 trillion off the value of all U.S. stocks, and elicited predictions 
of another Great Depression. On “Black Monday,” the Dow Jones industrial 
average plummeted 508 points, or 22.6 percent, to 1,738.74. Contrary to com- 
mon belief, the US was not the first to decline sharply. Non-Japanese Asian 
markets began a severe decline on 19 October, 1987, their time, and this 
decline was echoed first on a number of European markets, then in North 
American, and finally in Japan. However, most of the same markets had ex- 
perienced significant but less severe declines in the latter part of the previous 
week. With the exception of the US and Canada, other markets continued 
downward through the end of October, and some of these declines were as 
large as the great crash on 19 October. 

On 19 December, 1994, the Mexican government, facing a solvency crisis, 
chose to devaluate the peso and abandoned its exchange rate parity with 
the dollar. This devaluation plunged the country into a major financial crisis 
which quickly propagated to the rest of the Latin American countries. 

From July 1997 to December 1997, several East Asian markets crashed, 
starting with the Thai market on 2 July , 1997 and ending with the Hong Kong 
market on 17 October, 1997. After this regional event, the turmoil spread over 
to the American and European markets. 

The “slow” crash and in particular the turbulent behavior of the stock 
markets worldwide starting mid-August 1998 are widely associated with and 
even attributed to the plunge of the Russian financial markets, the devaluation 
of its currency and the default of the government on its debt obligations. 

The Nasdaq Composite index dropped precipitously with a low of 3227 on 
17 April, 2000, corresponding to a cumulative loss of 37% counted from its 
all-time high of 5133 reached on 10 March, 2000. The drop was mostly driven 
by the so-called “New Economy” stocks which have risen nearly four-fold over 
1998 and 1999 compared to a gain of only 50% for the Standard & Poor’s 500 
index. And without technology, this benchmark would be flat. 

All these events epitomize the observation often reported by market profes- 
sionals that, “during major market events, correlations change dramatically” 
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[71], as mentioned above. The possible existence of changes of correlation, or 
more precisely of changes of dependence, between assets and between markets 
in different market phases has obvious implications in risk assessment, port- 
folio management and in the way policy and regulation should be formulated. 
Concerning portfolio management, these questions related to state-varying- 
dependence are important for practical applications since in such a case the 
optimal portfolio will also become state-dependent. Neglecting this effect can 
lead to very inefficient asset allocations [14, 15]. In this spirit, the Argentinean 
crisis in 2001 has triggered fears of a contagion to other Latin American mar- 
kets. Also, the Enron financial scandal at the end of 2001 seems to have opened 
a flux of similar bankruptcies in other “new economy” companies. 


6.1.2 Review of Different Perspectives 


In the academic world, all these manifestations of propagating crises have 
given birth to an intense activity concerning the notion of contagion (see 
[102] for a review). According to the most commonly accepted definition, con- 
tagion is characterized by as an increase in the correlation (or, more generally, 
dependence) across markets during periods of turmoil. In fact, as we shall see, 
there are two distinct classes of mechanisms for understanding “changes of 
correlations,” not necessarily mutually exclusive. 


e It is possible that there are genuine changes with time of the uncondi- 
tional (with respect to amplitudes) correlations and thus of the underly- 
ing structure of the dynamical processes, as observed by identifying shifts 
in ARMA-ARCH/GARCH processes [440], in regime-switching models 
[14, 15] or in contagion models [395, 396]. Many workers (see for instance 
(314, 477]) have shown that the hypothesis of a constant conditional cor- 
relation for stock returns or international equity returns must be rejected. 
In fact, there is strong evidence that the correlations are not only time- 
dependent but also state-dependent. Indeed, as shown in [271, 397], the 
correlations increase in periods of large volatility. Moreover, Longin and 
Solnik [315] have proved that the correlations across international equity 
markets are also trend-dependent. 

e A second class of explanation is that correlations between two variables 
conditioned on signed exceedance (one-sided) or on absolute value (volatil- 
ity) exceedance of one or both variables may deviate significantly from the 
unconditional correlation [78, 316, 317]. In other words, with a fixed un- 
conditional correlation p, the measured correlation conditional of a given 
bullish trend, bearish trend, high or low market volatility, may in general 
differ from p and can be viewed as a function of the specific market phase. 
According to this explanation, changes of correlation may be only a fal- 
lacious appearance that stems from a change of volatility or a change of 
trend of the market and not from a real change of unconditional correlation 
or dependence. 
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The existence of the second class of explanation is appealing by its parsi- 
mony, as it posits that observed “changes of correlation” may simply result 
from the way the measure of dependence is performed. This approach has 
been followed by several authors but is often open to misinterpretation, as 
stressed in [178]. In addition, it may also be misleading since it does not 
provide a signature or procedure for identifying the existence of a genuine 
contagion phenomenon, if any. Therefore, in order to clarify the situation and 
eventually develop more adequate tools for probing the dependences between 
assets and between markets, it is highly desirable to characterize the different 
possible ways with which higher or lower conditional dependence can occur in 
models with constant unconditional dependence. In order to make progress, it 
is necessary to first distinguish between the different measures of dependence 
between two variables for large or extreme events that have been introduced 
in the literature. This is because the conclusions that one can draw about 
the variability of dependence are sensitive to the choice of its measure. These 
measures include the following. 


1. The correlation conditioned on signed exceedance of one or both variables 
(101, 78, 316, 317] that we call respectively pf and p,, where u and v 
denote the thresholds above which the exceedances are calculated. 

2. The correlation conditioned on absolute value exceedance (or large volatil- 
ity), above the threshold v, of one or both variables [101, 78, 316, 317] 
that we call p% (for a condition of exceedance on one variable). 

3. The local correlation (whose definition is given in Sect. 4.1.2), which is 
immune to the biases associated with the two aforementioned conditional 
correlation coefficients. Bradley and Taqqu have used it to introduce a 
new diagnostic of contagion: contagion from market X to market Y is 
qualified if there is more dependence between X and Y when X is doing 
badly than when X exhibits typical performance, that is, if there is more 
dependence at the loss tail distribution of X than at its center [80, 81, 82]. 

4. The tail-dependence parameter , which has a simple analytical expression 
when using copulas [149, 147] such as the Gumbel copula [315], and whose 
estimation provides useful information about the occurrence of extreme 
co-movements [260, 334, 390]. 

5. The spectral measure associated with the tail index (assumed to be the 
same for all assets) of extreme value multivariate distributions [43, 224, 
462]. 

6. Tail indices of extremal correlations defined as the upper or lower corre- 
lation of exceedances of ordered log-values [395]. 

7. Confidence weighted forecast correlations [53] or algorithmic complexity 
measures [342]. 


The contribution of this chapter is both methodological and empirical. On 
the methodological front, first of all, we review the existing tools available 
for probing the dependence between large or extreme events for several mod- 
els of interest for financial time series; second, we provide explicit analytical 
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expressions for these measures of dependence between two variables; third, 
this allows us to quantify the misleading interpretations of certain conditional 
coefficients commonly used for exploring the evolution of the dependence as- 
sociated with a change in the market conditions (an increase of the volatility, 
for instance). On the empirical front, the theoretical results are applied to the 
controversial problem of the occurrence or absence of a contagion phenomenon 
across Latin American markets during the turmoil period associated with the 
Mexican crisis in 1994 or with the recent Argentinean crisis in 2001. For this 
purpose, the novel insight derived from the analysis of several measures of 
dependence is applied to the question of a possible evolution of the depen- 
dence between the Argentinean, Brazilian, Chilean and Mexican markets with 
respect to the market conditions. 

The dependence measures discussed below are the conditional correlation 
coefficients p;., p%, Pu, the conditional Spearman’s rho p.(v) and the tail de- 
pendence coefficients \ and \, whose properties have been summarized in 
Chap. 4, for several models among which are the bivariate Gaussian distribu- 
tion, the bivariate Student’s distribution, and the one factor model for various 
distributions of the factor. A priori, one could hope for the existence of logical 
links between some of these measures, such as a vanishing tail-dependence 
parameter \ implies vanishing asymptotic conditional correlation coefficients. 
In fact, this turns out to be wrong and one can construct simple examples 
for which all possible combinations occur. Therefore, each of these measures 
probe a different quality of the dependence between two variables for large or 
extreme events. In addition, even if the conditional correlation coefficients are 
asymptotically zero, they decay in general extremely slowly, as inverse powers 
of the value of the threshold, and may thus remain significant for most practi- 
cal applications. These results will allow us to assert that, somewhat similarly 
to risks whose adequate characterization requires an extension beyond the 
restricted one-dimensional measure in terms of the variance to include all 
higher order cumulants or more generally the knowledge of the full distrib- 
ution [453, 6], these results suggest that large and/or extreme dependences 
have also a multidimensional character. 


6.2 Conditional Correlation Coefficient 


In this section, we discuss the properties of the correlation coefficient con- 
ditioned on one variable. We study the difference between conditioning on 
the signed values or on absolute values of the variable (conditioning on the 
absolute value of the variable of interest is only meaningful when its distrib- 
ution is symmetric). This allows us to conclude that conditioning on signed 
values generally provides more information than conditioning on absolute val- 
ues. Moreover, as already underlined for instance by Boyer et al. [78], the 
conditional correlation coefficient is shown to suffer from a bias which forbids 
its use as a measure of change in the correlation between two assets when 
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the volatility increases (many papers on contagion unfortunately use the con- 
ditional correlation coefficient as a probe to detect changes of dependence). 
We then present an empirical illustration of the evolution of the correlation 
between several stock indexes of Latin American markets. 


6.2.1 Definition 


Let us consider the correlation coefficient p4 of two real random variables X 
and Y conditioned on Y € A, where A is a subset of R such that Pr{Y € 
A} > 0. By definition, the conditional correlation coefficient p,4 is given by 


= Cov(X,Y | Y € A) 
/Var(X |Y € A)- Var(Y [Y € A) | 


PA (6.1) 


This general expression of the conditional correlation coefficient can be trans- 
formed into closed formula for several standard distributions and models. This 
will allow us to investigate the influence of the conditioning set and the un- 
derlying model on the behavior of py. 


6.2.2 Influence of the Conditioning Set 


Let the variables X and Y have a multivariate Gaussian distribution with 
(unconditional) correlation coefficient p. The following result has been proved 
[78]: 


p 
pa= 
; Var(Y 
Je : (1 ? var ; ey 


Note that p and p, have the same sign, that p4 = 0 if and only if p = 0 
and that p,4 does not depend directly on Var(X). Note also that p can be 
either greater or smaller than p since Var(Y | Y € A) can be either greater 
or smaller than Var(Y). Let us illustrate this property in the two following 
examples, with a conditioning on large positive (or negative) returns and a 
conditioning on large volatility. The difference comes from the fact that in the 
first case, one accounts for the trend while one neglects this information in 
the second case. 

These two simple examples will show that, in the case of two Gaussian ran- 
dom variables, the two conditional correlation coefficients pf and p% exhibit 
opposite behaviors since the conditional correlation coefficient pr is a decreas- 
ing function of the conditioning threshold v (and goes to zero as v — +00) 
while the conditional correlation coefficient p% is an increasing function of uv 
and goes to one as v — oo. These opposite behaviors seem very general and 
do not depend on the particular choice of the joint distribution of X and Y, 
namely the Gaussian distribution studied until now, as it will be seen in the 
sequel. 


(6.2) 
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This result underlines the importance of the choice of the conditioning 
set with the following two caveats that we stress again. First, as already 
stressed by many authors, the conditional correlation pf or p% change with 
the value of the threshold v even if the unconditional correlation p remains 
unchanged. Thus, the observation of a change in the conditional correlation 
does not provide a reliable signature of a change in the true (unconditional) 
correlation. Second, the conditional correlations can exhibit really opposite 
behaviors depending on the conditioning sets. Specifically, accounting for a 
signed trend or only for its amplitude may yield a decrease or an increase 
of the conditional correlation with respect to the unconditional one, so that 
these changes cannot be interpreted as a strengthening or a weakening of the 
correlations. 


Example 1: Conditioning on Large (Positive) Returns 


Let us first consider the conditioning set A = [v,+oo), with v € R;. Thus 
pa is the correlation coefficient conditioned on the returns Y larger than a 
given positive threshold v. It will be denoted by pf in the sequel. Assuming for 
simplicity, but without loss of generality that Var(Y) = 1, an exact calculation 
given below shows that, for large v, 


ae (6.3) 


=F ay pees 
Py U—->0o (ae = 2 lv] ) 


which slowly goes to zero as v goes to infinity. Obviously, by symmetry, the 
conditional correlation coefficient p;, conditioned on Y smaller than v, obeys 
the same formula. 


Proof. We start with the calculation of the first and the second moments of 
Y conditioned on Y larger than v: 


2 1 2 1 
E(Y|Y>v) = v2 =e a+0(=). (6.4) 
ne ® erfe (+) 
E(Y? |Y¥Se)=14 mel =y?+42 z -o(5). (6.5) 
J/me= erfc (+5) sf . 


which allows us to obtain the variance of Y conditioned on Y larger than v: 


2 2 
Var(Y | Y > v) =14 at mis 
re? erfe (5) me? erfe (5) 


=i+0 (=) | (6.6) 


which together with (6.2) yields (6.3) for large v. 


Se 


2 
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Example 2: Conditioning on Large Volatilities 


Let now the conditioning set be A = (—oo, —v] U [v, +00), with v € Ry. 
Thus p, is the correlation coefficient conditioned on |Y| larger than v, i.e., 
it is conditioned on a large volatility of Y. Still assuming Var(Y) = 1, this 
correlation coefficient is denoted by p% and, for large v 


11-p? 1 
Py ™v—oo —— “oo sgn(p) : (1 ~ 3 a =) ’ (6.7) 
2 1—p? 2 p U 
Ve os Q+v2 
which goes to (plus or minus) 1 as v goes to infinity according to 1—|p%| ~y..00 
1—p? —2 
2p? v 


Proof. The correlation coefficient conditioned on |Y| larger than v can be 
written 


(6.8) 


The first and second moment of Y conditioned on |Y| larger than v can be 
easily calculated: 
E(Y | |Y|>v) =0, (6.9) 
2 2 1 
E(Y? | |Y|>v)=1+ Be =v +2 -o( i: (6.10) 
Jre = erfc (=) cs e 
Expression (6.10) is the same as (6.6) as it should. This gives the following 
conditional variance: 


Var(Y | |Y|>v)=14 v2 =v? +24 o(=) , (6.11) 


Jre'> erfe (+) 


and finally yields (6.7), for large v. 


Intuitive Meaning 


Let us provide an intuitive explanation (see also [315]). As seen from (6.2), p* 
is controlled by Var(Y | Y > v) « 1/v? derived in the example 1. In contrast, 
as seen from (6.8), p% is controlled by Var(Y | |Y| > v) x v? given in the 
example 2. The difference between p{ and p% can thus be traced back to that 
between Var(Y | Y > v) « 1/v? and Var(Y | |Y| > v) « v? for large v. 

This results from the following effect. For Y > v, one can picture the 
possible realizations of Y as those of a random particle on the line, which is 
strongly attracted to the origin by a spring (the Gaussian distribution that 
prevents Y from performing significant fluctuations beyond a few standard 
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deviations) while being forced to be on the right to a wall at Y = v. It is 
clear that the fluctuations of the position of this particle are very small as 
it is strongly glued to the impenetrable wall by the restoring spring, hence 
the result Var(Y | Y > v) x 1/v?. In contrast, for the condition |Y| > v, by 
the same argument, the fluctuations of the particle are constrained to be very 
close to |Y| = v, #.e., very close to Y = +v or Y = —v. Thus, the fluctuations 
of Y typically flip from —v to +v and vice-versa. It is thus not surprising to 
find Var(Y | |Y| > v) « v?. 

This argument makes intuitive the results Var(Y | Y > v) « 1/v? and 
Var(Y | |Y| > v) « v? for large v and thus the results for p and for p$ 
if we use (6.2) and (6.8). We now attempt to justify pf ~,.. + and 1 — 
ps, ~v-s00 1/v? directly by the following intuitive argument. Using the picture 
of particles, X and Y can be visualized as the positions of two particles which 
fluctuate randomly. Their joint bivariate Gaussian distribution with nonzero 
unconditional correlation amounts to the existence of a spring that ties them 
together. Their Gaussian marginals also exert a spring-like force attaching 
them to the origin. When Y > v, the X-particle is teared off between two 
extremes, between 0 and v. When the unconditional correlation p is less than 
1, the spring attracting to the origin is stronger than the spring attracting 
to the wall at v. The particle X thus undergoes tiny fluctuations around the 
origin that are relatively less and less attracted by the Y-particle, hence the 
result pf ~ysco + > 0. In contrast, for |Y| > v, notwithstanding the still 
strong attraction of the X-particle to the origin, it can follow the sign of 
the Y-particle without paying too much cost in matching its amplitude |v]. 
Relatively tiny fluctuation of the X-particle but of the same sign as Y % +u 
will result in a strong p>, thus justifying that p>} — 1 for v > +00. 


6.2.3 Influence of the Underlying Distribution 
for a Given Conditioning Set 


For a fixed conditioning set defining a specific conditional correlation coef- 
ficient like pf or p’, the behavior of these coefficients can be dramatically 
different from a pair of random variables to another one, depending on their 
underlying joint distribution. As an example, let the variables X and Y have 
a multivariate Student’s distribution with v degrees of freedom and an (un- 
conditional) correlation coefficient p. According to the proposition stated in 
Appendix 6.B.1, we have the exact formula 


p 


PA (BEX? | ¥)-@¥? | Yea] 
pen Var(Y | YEA) 


(6.12) 


Explicit formulas for E[E(X? | Y) — p?Y? | Y € A] and Var(Y | Y € A) are 
also given in Appendix 6.B.1. The proof of (6.12) is presented in Appendix 
6.B.2. 
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Expression (6.12) is the analog for a Student bivariate distribution to (6.2) 
derived above for the Gaussian bivariate distribution. Again, p and p, share 
the following properties: they have the same sign, p.4 equals zero if and only 
if p equals zero and p, can be either greater or smaller than p. Applying this 
general formula (6.12) to the calculus of p,7 and p%, we find (see Appendices 
6.B.3 and 6.B.4) that, conditioning on large returns, 


Py —v-+400 
feted 0-0) 


while when conditioning on large volatility, 


(6.13) 


s p 
Py v—> +00 . 
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p{ and p® converge both, at infinity, to nonvanishing constants (excepted for 
p = 0). Moreover, for v larger than 1. ~ 2.839, this constant is smaller than 
the unconditional correlation coefficient p, for all value of p, in the case of pf, 
while for p$ it is always larger than p, whatever v (larger than two) may be. 

These results show that, conditioned on large returns, p{ is a decreasing 
function of the threshold v (at least when v > 2.839), while, conditioned on 
large volatilities, p? is an increasing function of v. 

To give another example, let us now assume that X and Y are two random 
variables following the equation: 


(6.14) 


X=6Y +e, (6.15) 


where a is a nonrandom real coefficient and € an idiosyncratic noise indepen- 
dent of Y, whose distribution admits a centered moment of second order o?. 
Let us also denote by Oo; the second centered moment of the variable Y. This 
relation between X and Y corresponds to the so-called one factor model. This 
one factor model with independence between Y and € is of course naive for 
concrete applications, as it neglects the potential influence of other factors in 
the determination of X. However, it has been argued to be a useful model in 
the context of contagion, and several studies have been based upon it (see [29] 
or [178], for instance). Moreover, it provides a simple illustrative model with 
rich and somewhat surprising results. 

One can straightforwardly show that the conditional correlation coefficient 
of X and Y is 


p 
PA = 
\ Var(y) 
f+ 0 Pvt eA 


(6.16) 


where 
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pee (6.17) 


(Pot +o? 


denotes the unconditional correlation coefficient of X and Y. Note that the 
term o? in the expression (6.17) of p is the only place where the influence of 
the idiosyncratic noise is felt. 

Expression (6.16) is the same as (6.2) for the bivariate Gaussian situation 
studied in Sect. 6.2.2. This is not surprising since, in the case where Y and € 
have univariate Gaussian distributions, the joint distribution of X and Y is 
a bivariate Gaussian distribution. The new fact is that this expression (6.16) 
remains true whatever the distribution of Y and ¢, provided that their second 
moments exist. 

We now present the asymptotic expression of p.4 for Y with a Gaussian 
or a Student’s distribution. Note that the expression of p, is simple enough 
to allow for exact calculations for a larger class of distributions, but for illus- 
tration, these two simple cases will be sufficient. 

Assuming that Y has a Gaussian distribution, while the distribution of € 
can be everything (provided that E[e?] < co), allows us to show that the same 
results as those given by (6.3) and (6.7) still hold, so that pf goes to zero, 
while p} goes to one. 

In contrast, assuming that Y has a Student’s distribution yields for both 
py and py: 


js. _sgn(Z) 
i ee eee 
V1+4 


where K is a positive constant. p,1°* thus goes to +1 as v goes to infinity with 
1 — |p{'s| « 1/v?, which shows that they can have similar behaviors. 


(6.18) 


6.2.4 Conditional Correlation Coefficient on Both Variables 


Since the exploration of the behavior of the correlation coefficient conditioned 
on only one variable clearly indicates that it can exhibit any kind of behavior, 
it is natural to look for the effect of a more constraining conditioning. To 
this aim, let us consider two random variables X and Y and define their 
conditional correlation coefficient p.4,8, conditioned upon X € A and Y € B, 
where A and B are two subsets of R such that Pr{X € A,Y € B} > 0, by 


= Cov(x,Y | X € A,Y € B) 
/Var(X |X € A,Y € B)- Var(Y |X €A,Y €B)- 


PAB (6.19) 


In this case, it is much more difficult to obtain general results for any 
specified class of distributions compared with the previous case of conditioning 
on a single variable. Here, we give the asymptotic behavior for a Gaussian 
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distribution in the situation detailed below, using the expressions in [252, 
page 113], or proposition A.1 of [15]. 

Let us assume that the pair of random variables (X,Y) has a Normal 
distribution with unit unconditional variance and unconditional correlation 
coefficient p. The subsets A and B are both chosen equal to [u,-+oo), with 
u € Rx, so that we focus on the correlation coefficient conditional on the 
returns of both X and Y larger than the threshold u. Denoting by p, the 
correlation coefficient conditional on this particular choice for the subsets A 
and 6, Appendix 6.A shows that, for large u, 


l+p 1 
Pu ~u—-co P 7 


Fp Pape (6.20) 
which goes to zero. This decay is faster than p> ~» 400 1/v given by (6.3) 
resulting from the conditioning on a single variable. However, unfortunately, 
there is no qualitative change. Thus, the correlation coefficient conditioned 
on both variables does not yield new significant information and does not 
provide any special improvement with respect to the correlation coefficient 
conditioned on a single variable. 


6.2.5 An Example of Empirical Implementation 


Let us consider four national stock markets in Latin America, namely Ar- 
gentina (MERVAL index), Brazil (IBOV index), Chile (IPSA index) and Mex- 
ico (MEXBOL index). We are particularly interested in the contagion effects 
which may have occurred across these markets. We will study this question 
for the market indexes expressed in US Dollar to emphasize the effect of the 
devaluations of local currencies and to account for monetary crises. Doing so, 
we follow the same methodology as in most contagion studies (see [178], for 
instance). Our sample contains the daily (log) returns of each stock in local 
currency and US dollar during the time interval from 15 January, 1992 to 15 
June, 2002 and thus encompasses both the Mexican crisis as well as the more 
recent Argentinean crisis. 

Before applying the theoretical results derived above, we need to test 
whether the distributions of the returns are not too fat-tailed so that the 
correlation coefficient exists. Recall that this is the case if and only if the tail 
of the distribution decays faster than a power law with tail index = 2, and 
its estimator given by the Pearson’s coefficient is well behaved if at least the 
fourth moment of the distribution is finite. 

Figure 6.1 shows the complementary distribution of the positive and neg- 
ative tails of the index returns of four Latin American countries in US dollars. 
The positive tail clearly decays faster than a power law with tail index pw = 2. 
In fact, Hill’s estimator provides a value ranging between 3 and 4 for the four 
indexes. The situation for the negative tail is slightly different, particularly for 
the Brazilian index. For the Argentina, the Chilean and the Mexican indexes, 
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Fig. 6.1. The upper (respectively lower) panel graphs the complementary distri- 
bution of the positive (respectively the minus negative) returns in US dollar of the 
indices of four countries (Argentina, Brazil, Chile and Mexico). The straight line 
represents the slope of a power law with tail exponent ps = 2 


the negative tail behaves almost like the positive one, but for the Brazilian 
index, the negative tail exponent is hardly larger than two, as confirmed by 
Hill’s estimator. This means that, in the Brazilian case, the estimates of the 
correlation coefficient will be particularly noisy and thus of weak statistical 
value. 

We have checked that the fat-tailness of the indexes expressed in US dollar 
comes from the impact of the exchange rates. Thus, an alternative should be 
to consider the indexes in local currency, following the methodology of [314] 
and [315], but it would lead to focus on the linkages between markets only and 
to neglect the impact of the devaluations, which is precisely the main concern 
of studies on contagion. 

Figures 6.2, 6.3 and 6.4 give the conditional correlation coefficient. p/1~ 
(plain thick line) for the pairs (Argentina/Brazil), (Brazil/Chile) and (Chile/ 
Mexico) while the Figs. 6.5, 6.6 and 6.7 show the conditional correlation co- 
efficient p§ for the same pairs. For each figure, the thick dashed line gives the 
theoretical curve obtained under the bivariate Gaussian assumption whose 
analytical expressions can be found in Sect. 6.2.2. The unconditional corre- 
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Fig. 6.2. In the upper panel, the thick plain curve depicts the correlation coeffi- 
cient between the daily returns of the Argentinean and the Brazilian stock indices 
conditional on the Brazilian stock index daily returns larger than (smaller than) a 
given positive (negative) value uv (after normalization by the standard deviation). 
The thick dashed curve represents the theoretical conditional correlation coefficient 
pi’ calculated for a bivariate Gaussian model, while the two thin dashed curves 
define the area within which we cannot consider at the 95% confidence level that 
the estimated correlation coefficient is significantly different from its Gaussian theo- 
retical value. The dotted curves provide the same information under the assumption 
of a bivariate Student’s model with v = 3 degrees of freedom. The lower panel is 
the same as the upper panel but the conditioning is on the Argentinean stock index 
daily returns larger than (smaller than) a given positive (negative) value v (after 
normalization by the standard deviation) 


lation coefficient of the Gaussian model is set to the empirically estimated 
unconditional correlation coefficient. The two thin dashed lines represent the 
interval within which we cannot reject, at the 95% confidence level, the hy- 
pothesis according to which the estimated conditional correlation coefficient is 
equal to the theoretical one. This confidence interval has been estimated using 
the Fisher’s statistics. Similarly, the thick dotted curve graphs the theoreti- 
cal conditional correlation coefficient obtained under the bivariate Student’s 
assumption with v = 3 degrees of freedom (whose expressions are given in 
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Fig. 6.3. Same as Fig. 6.2 for the (Brazil, Chile) pair. The upper (respectively 
lower) panel corresponds to a conditioning on the Chilean (respectively Brazilian) 
stock market index 


Appendices 6.B.3 and 6.B.4) and the two thin dotted lines are its 95% confi- 
dence level. Here, the Fisher’s statistics cannot be applied, since it requires at 
least that the fourth moment of the distribution exists. In fact, Meerschaert 
and Scheffler have shown that, for v = 3, the distribution of the sample cor- 
relation converges to a stable law with index 3/2 [356]. This explains why 
the confidence interval for the Student’s model with three degrees of free- 
dom is much larger than the confidence interval for the Gaussian model. In 
the present case, we have used a bootstrap method to derive this confidence 
interval since the scale factor of the stable law is difficult to calculate. 

In Figs. 6.2, 6.3 and 6.4, the changes in the conditional correlation coef- 
ficients p{’~ are not significantly different, at the 95% confidence level, from 
those obtained with a bivariate Student’s model with three degrees of free- 
dom. In contrast, the Gaussian model is almost always rejected as expected, 
since marginal returns distributions are not Gaussian (as shown by Fig. 6.1). 
In fact, similar results hold (but are not depicted here) for the three oth- 
ers pairs (Argentina/Chile), (Argentina/Mexico) and (Brazil/Mexico). Since 
these results are compatible with a Student’s model with constant correlation, 
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Fig. 6.4. Same as Fig. 6.2 for the (Chile, Mexico) pair. The upper (respectively 
lower) panel corresponds to a conditioning on the Mexican (respectively Chilean) 
stock market index 


this suggests that no change in the correlations, and therefore no contagion 
mechanism, needs to be invoked to explain the data. 

Let us now discuss the results obtained for the correlation coefficient con- 
ditioned on the volatility. Figures 6.5 and 6.7 show that the estimated cor- 
relation coefficients conditioned on volatility remain consistent with the Stu- 
dent’s model with three degrees of freedom, while they still reject the Gaussian 
model. In contrast, Fig. 6.6 shows that the increase of the correlation cannot be 
explained by any of the Gaussian or Student models, when conditioning on the 
Mexican index volatility. Indeed, when the Mexican index volatility becomes 
larger than 2.5 times its standard deviation, none of these models can ac- 
count for the increase of the correlation. The same discrepancy is observed for 
the pairs (Argentina/Chile), (Argentina/Mexico) and (Brazil/Mexico) which 
are not shown here. In each case, the Chilean and the Mexican markets have 
an impact on the Argentinean and the Brazilian markets which cannot be 
accounted for by neither the Gaussian model nor the Student model with 
constant correlation. 

To conclude this empirical part, there is no significant increase in the real 
correlation between Argentina and Brazil on the one hand and between Chile 
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Fig. 6.5. In the upper panel, the thick plain curve gives the correlation coeffi- 
cient between the daily returns of the Argentinean and the Brazilian stock indices 
conditioned on the daily volatility of the Brazilian stock index being larger than 
a given value v (after normalization by the standard deviation). The thick dashed 
curve represents the theoretical conditional correlation coefficient pj’~ calculated 
for a bivariate Gaussian model, while the two thin dashed curves delineate the area 
within which we cannot consider at the 95% confidence level that the estimated 
correlation coefficient is significantly different from its Gaussian theoretical value. 
The dotted curves provide the same information using a bivariate Student’s model 
with v = 3 degrees of freedom. The lower panel is the same as the upper panel but 
the conditioning is on the Argentinean stock index 


and Mexico on the other hand, when the volatility or the returns exhibit large 
moves. In contrast, in period of high volatility, the Chilean and Mexican mar- 
ket seem to have a genuine impact on the Argentinean and Brazilian markets. 
A priori, this should confirm the existence of a contagion across these mar- 
kets. However, this conclusion is based only on two theoretical models. One 
should thus remain cautious before concluding positively on the existence of 
contagion on the sole basis of these results, in particular in view of the use of 
theoretical models which are all symmetric in their positive and negative tails. 
Such a symmetry is crucial for the derivation of the theoretical expressions of 
p;. However, the empirical sample distributions are certainly not symmetric, 
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Fig. 6.6. Same as Fig. 6.5 for the (Brazil, Chile) pair. The upper (respectively 
lower) panel corresponds to a conditioning on the Chilean (respectively Brazilian) 
stock market index 


as shown in Fig. 6.1. Using univariate and bivariate switching volatility mod- 
els, Edwards and Susmel [142] have found strong volatility co-movements in 
Latin American but no clear evidence of contagion. 


6.2.6 Summary 


The previous sections have shown that the conditional correlation coefficients 
can exhibit all possible types of behavior, depending on their conditioning set 
and the underlying distributions of returns. More precisely, we have shown 
that the correlation coefficients, conditioned on large returns or volatility 
above a threshold v, can be either increasing or decreasing functions of the 
threshold, can go to any value between zero and one when the threshold goes 
to infinity and can produce contradictory results in the sense that accounting 
for a trend or not can lead to conclude on an absence of linear correlation or 
on a perfect linear correlation. Moreover, due to the large statistical fluctua- 
tions of the empirical estimates, one should be very careful when concluding 
on an increase or decrease of the genuine correlations. 

Thus, from the general standpoint of the study of extreme dependences, 
but more particularly for the specific problem of the contagion across coun- 
tries, the use of conditional correlation does not seem very informative and 
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Fig. 6.7. Same as Fig. 6.5 for the (Chile, Mexico) pair. The upper (respectively 
lower) panel corresponds to a conditioning on the Mexican (respectively Chilean) 
stock market index 


is sometimes misleading since it leads to spurious changes in the observed 
correlations: even when the unconditional correlation remains constant, con- 
ditional correlations yield artificial changes. Since one of the most commonly 
accepted and used definition of contagion is the detection of an increase of the 
conditional correlations during a period of turmoil, namely when the volatil- 
ity increases, these results cast serious shadows on previous studies. In this 
respect, the conclusions of Calvo and Reinhart [87], about the occurrence of 
contagion across Latin American markets during the 1994 Mexican crisis, but 
more generally also the results of [271] or [299], on the effect of the October 
1987 crash on the linkage of national markets, must be considered with some 
caution. It is quite desirable to find a more reliable tool for studying extreme 
dependences. 


6.3 Conditional Concordance Measures 


The (conditional) correlation coefficients, which have just been investigated, 
suffer from several theoretical as well as empirical deficiencies. From the 
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theoretical point of view, they constitute just linear measures of dependence. 
Thus, as recalled in Chap. 4, they are fully satisfying only for the descrip- 
tion of the dependence of variables with elliptical distributions. Moreover, 
we have seen that the correlation coefficient aggregates the information con- 
tained both in the marginal and in the collective behavior. The correlation 
coefficient is not invariant under an increasing change of variable, a trans- 
formation which is known to let unchanged the dependence structure. From 
the empirical standpoint, we have seen that, for some data, the correlation 
coefficient may not always exist, and even when it exits, it cannot always be 
accurately estimated, due to sometimes “wild” statistical fluctuations. Thus, 
it is desirable to find another measure of the dependence between two assets or 
more generally between two random variables, which, contrarily to the linear 
correlation coefficient, is always well-defined and only depends on the copula 
properties. This ensures that this measure is not affected by a change in the 
marginal distributions (provided that the mapping is increasing). It turns out 
that this desirable property is shared by all measures of concordance. Among 
these measures are the well-known Kendall’s tau, Spearman’s rho or Gini’s 
beta (see Sect. 4.2). 

However, these concordance measures are not well-adapted, as such, to the 
study of extreme dependence, because they are functions of the whole distrib- 
ution, including the moderate and small returns. A simple idea to investigate 
the extreme concordance properties of two random variables is to calculate 
these quantities conditioned on values larger than a given threshold and let 
this threshold go to infinity. 

In the sequel, we will only focus on the rank correlation which can be easily 
estimated empirically. It offers a natural generalization of the (linear) corre- 
lation coefficient. Indeed, Spearman’s rho quantifies the degree of functional 
dependence, whatever the functional dependence between the two random 
variables may be. This represents a very interesting improvement. Perfect cor- 
relations (respectively anticorrelation) give a value 1 (respectively —1) both 
for the standard correlation coefficient and for the Spearman’s rho. Otherwise, 
there is no general relation allowing us to deduce the Spearman’s rho from 
the correlation coefficient and vice-versa. 


6.3.1 Definition 


Recall that Spearman’s rho, denoted p, in the sequel, measures the difference 
between the probability of concordance and the probability of discordance 
for the two pairs of random variables (X1, Y,) and (X2, Y3), where the pairs 
(X1,¥1), (Xe, Yo) and (X3, Y3) are three independent realizations drawn from 
the same distribution: 


Ps = 3(Pr[(X1 — X2)(¥1 — ¥g) > 0] — Pr[(X1 — X2)(Vi — ¥3) <0). (6.21) 


Thus, setting U = Fy(X) and V = Fy(Y), we have seen that p, is nothing 
but the (linear) correlation coefficient of the uniform random variables U and 
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V (see Chap. 4): 


23 Cov(U, V) 
* /Var(UyVar(V) ” 


which justifies its name as a correlation coefficient of the rank, and shows that 
it can easily be estimated. 

An attractive feature of the Spearman’s rho is to be independent of the 
margins, as we can see in equation (6.22). Thus, contrarily to the linear cor- 
relation coefficient, which aggregates the marginal properties of the variables 
with their collective behavior, the rank correlation coefficient takes into ac- 
count only the dependence structure of the variables. 

Using expression (6.22), a natural definition of the conditional rank cor- 
relation, conditioned on V larger than a given threshold 0, can be proposed: 


(6.22) 


7 Cov(U,V | V >) 
JVar(U | V > 6)Var(V | V > 3) 


ps(0) ' (6.23) 


whose expression in term of the copula C(-,-) is given in Appendix 6.C. 

Obviously, p,(v) is not a true concordance measure, as defined at the end 
of Sect. 4.2. An alternative definition of the conditional Spearman’s rho [96] — 
and more generally of any conditional concordance measure — which preserves 
all the properties of concordance measures, can be obtained by considering 
the concordances measures of the conditional copula defined by (3.58). As 
an example, the conditional Kendall’s tau would be defined by the Kendall’s 
tau of the conditional copula. This idea has several advantages. In particular, 
when one focuses on the conditional Kendall’s tau, asymptotic results can 
be straightforwardly derived for Archimedean copulas, in relation with result 
(3.61). Indeed, considering an Archimedean copula with a regularly varying 
generator ¢ (with tail index @), the conditional copula (3.58) converges to 
Clayton’s copula with parameter @ as the threshold wu goes to zero. There- 
fore, Kendall’s tau ™, of the conditional copula converges to Kendall’s tau of 
Clayton’s copula, so that 


lim t, = 


— 24 
lim 74D? (6.24) 


according to Table 4.1. 


6.3.2 Example 


Contrarily to the conditional correlation coefficient, it is difficult to obtain 
analytical expressions for the conditional Spearman’s rho, for the Gaussian 
and Student distributions. Obviously, for many families of copulas known 
in closed form, equation (6.23) allows for an explicit calculation of ps(v). 
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Fig. 6.8. Conditional Spearman’s rho for a bivariate Gaussian copula (left panel) 
and a Student’s copula with three degrees of freedom (right panel), with an uncon- 
ditional linear correlation coefficient p = 0.1,0.3,0.5,0.7,0.9, as a function of the 
constraint level v 


However, most copulas of interest in finance have no simple closed form, so 
that it is necessary to resort to numerical computations. 

As an example, let us consider the bivariate Gaussian distribution (or 
copula) with unconditional correlation coefficient p. It is well-known that its 
unconditional Spearman’s rho is given by 


6 
ps = = aresin § (6.25) 


The left panel of Fig. 6.8 shows the conditional Spearman’s rho p,(v) defined 
by (6.23) obtained from a numerical integration. We observe the same bias 
as for the conditional correlation coefficient, namely the conditional rank cor- 
relation changes with v even though the unconditional correlation is fixed to 
a constant value. Nonetheless, this conditional Spearman’s rho seems more 
sensitive than the conditional correlation coefficient since one can observe in 
the left panel of Fig. 6.8 that, as v goes to one, the conditional Spearman’s rho 
ps(v) does not go to zero for all values of p (at the precision of our bootstrap 
estimates), as previously observed with the conditional correlation coefficient 
(see (6.3)). 

The right panel of Fig. 6.8 depicts the conditional Spearman’s rho of Stu- 
dent’s copula with three degrees of freedom. The biases are qualitatively the 
same as for the Gaussian copula, but p,(v) goes in this case to zero for all value 
of p when v goes to one. Thus, here again, several different behaviors can be 
observed depending on the underlying copula of the random variables. More- 
over, these two examples show that the quantification of extreme dependence 
is a function of the tools used to quantify this dependence. Here, the condi- 
tional Spearman’s p goes to a nonvanishing constant for the Gaussian model, 
while the conditional (linear) correlation coefficient goes to zero, contrarily to 
the Student’s distribution for which the situation is exactly the opposite. 
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6.3.3 Empirical Evidence 


Figures 6.9, 6.10 and 6.11 give the conditional Spearman’s rho respectively for 
the (Argentinean/Brazilian), the (Brazilian/Chilean), and the (Chilean/Mex- 
ican) stock markets. As previously, the plain thick line refers to the estimated 
correlation, while the dashed lines refer to the Gaussian copula and its 95% 
confidence levels and and dotted lines to Student’s copula with three degrees 
of freedom and its 95% confidence levels. 

Contrarily to the cases of the conditional (linear) correlation coefficient 
exhibited in Figs. 6.2, 6.3 and 6.4, the empirical conditional Spearman’s p 
does not always comply with the Student’s model (neither with the Gaussian 
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Fig. 6.9. In the upper panel, the thick curve shows Spearman’s rho between the 
Argentinean stock index daily returns and the Brazilian stock index daily returns. 
Above the quantile v = 0.5, Spearman’s rho is conditioned on the Brazilian index 
daily returns whose quantiles are larger than v, while below the quantile v = 0.5 it is 
conditioned on the Brazilian index daily returns whose quantiles are smaller than v. 
As in the above figures for the correlation coefficients, the dashed lines refer to the 
prediction of the Gaussian copula and its 95% confidence levels and the dotted lines 
to Student’s copula with three degrees of freedom and its 95% confidence levels. The 
lower panel is the same as the upper panel but with the conditioning done on the 
Argentinean index daily returns 
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Fig. 6.10. Same as Fig. 6.9 for the (Brazil, Chile) pair. The upper (respectively 
lower) panel corresponds to a conditioning on the Chilean (respectively Brazilian) 
stock market index 


one), and thus confirm the discrepancies observed in Figs. 6.5, 6.6 and 6.7. In 
all cases, for thresholds v larger than the quantile 0.5 corresponding to the 
positive returns, the Student model with three degrees of freedom is almost 
always sufficient to explain the data. In contrast, for the negative returns and 
thus thresholds v lower then the quantile 0.5, only the interaction between 
the Chilean and the Mexican markets is well described by the Student copula 
and does not need to invoke the contagion mechanism. For all other pairs, 
none of these models explain the data satisfyingly. Therefore, for these cases 
and from the perspective of these models, the contagion hypothesis seems to 
be needed. 

There are however several caveats. First, even though we have considered 
the most natural financial models, there may be other models with constant 
dependence structure, that we have ignored, which could account for the ob- 
served evolutions of the conditional Spearman’s p. If this is the case, then 
the contagion hypothesis would not be needed. Second, the main discrepancy 
between the empirical conditional Spearman’s p and the prediction of Stu- 
dent’s model does not occur in the tails of the distribution, i.e for large and 
extreme movements, but in the bulk. Thus, during periods of turmoil, the 
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Fig. 6.11. Same as Fig. 6.9 for the (Chile, Mexico) pair. The upper (respectively 
lower) panel corresponds to a conditioning on the Mexican (respectively Chilean) 
stock market index 


Student’s model with three degrees of freedom seems to remain a good model 
of co-movements. Third, the contagion effect is never necessary for upwards 
moves. Indeed, we observe the same asymmetry or trend dependence as found 
in [315] for five major equity markets. This was apparent in Figs. 6.2, 6.3 and 
6.4 for pf’, and is strongly confirmed on the conditional Spearman’s p. 

Interestingly, there is also an asymmetry or directivity in the mutual influ- 
ence between markets. For instance, the Chilean and Mexican markets have 
an influence on the Argentinean and Brazilian markets, but the later do not 
have any impact on the Mexican and Chile markets. Chile and Mexico have 
no contagion effect on each other while Argentina and Brazil have. 

These empirical results on the conditional Spearman’s rho are different 
from and often opposite to the conclusion derived from the conditional corre- 
lation coefficients p{°~. This puts in light the difficulty in obtaining reliable, 
unambiguous and sensitive estimations of conditional correlation measures. 
In particular, Pearson’s coefficient usually employed to estimate the correla- 
tion coefficient between two variables is known to be not very efficient when 
the variables are fat-tailed and when the estimation is performed on a small 
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sample. Indeed, with small samples, Pearson’s coefficient is very sensitive to 
the largest value, which can lead to an important bias in the estimation. More- 
over, even with large sample sizes, Meerschaert and Scheffler [356] have shown 
that the nature of convergence of the Pearson coefficient of two times series 
with tail index yz toward the theoretical correlation, as the sample size T tends 
to infinity, is sensitive to the existence and strength of the theoretical corre- 
lation. If there is no theoretical correlation between the two times series, the 
sample correlation tends to zero with Gaussian fluctuations. If the theoretical 
correlation is nonzero, the difference between the sample correlation and the 
theoretical correlation times T!~?/" converges in distribution to a stable law 
with index y/2. These large statistical fluctuations are responsible for the lack 
of accuracy of the estimated conditional correlation coefficient encountered in 
the previous section. Thus, we think that the conditional Spearman’s p pro- 
vides a good alternative both from a theoretical and an empirical viewpoint. 


6.4 Extreme Co-movements 


For the sake of completeness, and since it is directly related to the multivari- 
ate extreme value theory, we study the coefficient of tail dependence A, which 
has been defined in Sect. 4.5. It would seem that the coefficient of tail depen- 
dence could provide a useful measure of the extreme dependence between two 
random variables for the analysis of contagion between markets. Two possi- 
bilities can occur. Either the whole data set does not exhibit tail dependence, 
and a contagion mechanism seems necessary to explain the occurrence of con- 
comitant large movements during turmoil periods. Or, the data set exhibits 
tail dependence which by itself is enough to produce concomitant extremes 
(and contagion is not needed). 

Unfortunately, the empirical estimation of the coefficient of tail dependence 
is a strenuous task. Indeed, a direct estimation of the conditional probability 
Pr{X > Fx~'(u) | Y > Fy~'(u)}, which should tend to \ when u > 1 is 
very difficult to implement in practice due to the combination of the curse of 
dimensionality and the drastic decrease of the number of realizations as u be- 
come close to one. A better approach consists in using kernel methods, which 
generally provide smooth and accurate estimators [168, 284, 305]. However, 
these smooth estimators lead to copulas which are differentiable. This auto- 
matically gives vanishing tail dependence, as already mentioned in Chap. 5. 
Indeed, in order to obtain a nonvanishing coefficient of tail dependence, it is 
necessary for the corresponding copula to be nondifferentiable at the point 
(1,1) (or at (0,0)). An alternative is then the fully parametric approach. One 
can choose to model dependence via a specific copula, and thus to deter- 
mine the associated tail dependence [315, 334, 380]. The problem with such 
a method is that the choice of the parameterization of the copula amounts to 
choose a priori whether or not the data presents tail dependence. 
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In fact, there are three ways for estimating the tail dependence coefficient. 
The two first methods are specific to a class of copulas or of models, while 
the last one is very general, but less accurate. The first method is only re- 
liable when the underlying copula is known to be Archimedean. In such a 
case, the limit theorem established by Juri and Wiithrich [260] (see Chap. 3.) 
allows one to estimate the tail dependence. The problem is that it is not 
obvious that the Archimedean copulas provide a good representation of the 
dependence structure for financial assets. For instance, the Archimedean cop- 
ulas are generally inconsistent with a representation of assets by linear factor 
models. A second method — based upon results of Sect. 4.5.3 — offers good 
results by allowing to estimate the tail dependence in a semiparametric way, 
which solely relies on the estimation of marginal distributions, when the data 
can be explained by a factor model [332, 335]. 

When none of these situations occur, or when the factors are too difficult 
to extract, a third and fully nonparametric method exists, which is based upon 
the mathematical results of Ledford and Tawn [294, 295] and Coles et al. [106] 
and has recently been applied by Poon et al. [390]. The method consists in 
transforming the original random variables X and Y into Fréchet random 
variables denoted by S and T respectively. Then, considering the variable 
Z = min{S,T}, its survival distribution is: 


Pr{Z > z} = L(z)- 2/0 as Z7 00, (6.26) 
where £ denotes a slowly varying function. Now, assuming that 


lim £L(z) = de (0, 1], (6.27) 

Z—Co 
the coefficient of tail dependence \ and the coefficient \, defined by (4.84), 
are simple functions of d and n: \ = 2-7-1 with A=0 ifn <1,or\=1 
and A = d otherwise. The parameters 7 and d can be estimated by maximum 
likelihood, and deriving their asymptotic statistics allows one to test whether 
the hypothesis \ = 1 can be rejected or not, and consequently, whether the 
data present tail dependence or not. 

Let us implement this procedure on the four previously considered Latin 
American markets (Argentina, Brazil, Chile and Mexico). The results for the 
estimated values of the coefficient of tail dependence are given in Table 6.1 
both for the positive and the negative tails. The tests show that one cannot 
reject the hypothesis of tail dependence between the four considered Latin 
American markets. Notice that the positive tail dependence is almost always 
slightly smaller than the negative one, which could be linked with the existence 
of trend asymmetry [3815], but it turns out that these differences are not sta- 
tistically significant. These results indicate that, according to this analysis of 
the extreme dependence coefficient, the propensity of extreme co-movements 
is almost the same for each pair of stock markets: even if the transmission 
mechanisms of a crisis are different from one country to another one, the 
propagation occurs with the same probability overall. Thus, the subsequent 
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Table 6.1. Coefficients of tail-dependence between pairs among four Latin Ameri- 
can markets. The figure within parenthesis gives the standard deviation of the esti- 
mated value derived under the assumption of asymptotic normality of the estimators. 
Only the coefficients above the diagonal are indicated since they are symmetric 


Negative tail Argentina Brazil Chile Mexico 
Argentina - 0.28 (0.04) 0.25 (0.04) 0.25 (0.05) 
Brazil = 0.19 (0.03) 0.25 (0.05) 
Chile — 0.24 (0.07) 
Mexico 

Positive tail Argentina Brazil Chile Mexico 
Argentina - 0.21 (0.06) 0.20 (0.04) 0.22 (0.04) 
Brazil = 0.28 (0.04) 0.19 (0.04) 
Chile 0.19 (0.03) 
Mexico 


Table 6.2. Coefficients of tail dependence between pairs among four Latin Amer- 
ican markets derived under the assumption of a Student copula with three degrees 
of freedom 


Student hypothesis v = 3 


Argentina Brazil Chile Mexico 
Argentina = 0.24 0.25 0.27 
Brazil = 0.24 0.27 


Chile - 0.28 
Mexico - 


risks are the same. Table 6.2 also gives the coefficients of tail dependence es- 
timated under the Student’s copula (or in fact any copula derived from an 
elliptical distribution — see Chap. 4) with three degrees of freedom, given by 
expression (4.91). One can observe a remarkable agreement between these 
values and the nonparametric estimates given in Table 6.1. This is consis- 
tent with the results given by the conditional Spearman’s rho, for which we 
have remarked that the Student’s copula seems to reasonably account for the 
extreme dependence. 


6.5 Synthesis and Consequences 
Table 6.3 summarizes the asymptotic dependences for large v and wu of the 


signed conditional correlation coefficient p{, the unsigned conditional corre- 
lation coefficient p$ and the correlation coefficient p, conditioned on both 
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variables for the bivariate Gaussian, the Student’s model, the Gaussian factor 
model and the Student’s factor model. These results provide a quantitative 
proof that conditioning on exceedance leads to conditional correlation coef- 
ficients that may be very different from the unconditional correlation. This 
provides a straightforward mechanism for fluctuations or changes of correla- 
tions, based on fluctuations of volatility or changes of trends. In other words, 
the many reported variations of correlation structure might be in large part 
attributed to changes in volatility (and statistical uncertainty). 

The distinct dependences as a function of exceedance v and u of the condi- 
tional correlation coefficients offer novel tools for characterizing the statistical 
multivariate distributions of extreme events. Since their direct characteriza- 
tion is in general restricted by the curse of dimensionality and the scarcity of 
data, the conditional correlation coefficients provide reduced statistics which 
can be estimated with reasonable accuracy and reliability at least when the 
pdf of the data decays faster than any hyperbolic function with tail index 
equal to 2. In this respect, the empirical results suggest that a Student’s cop- 
ula, or more generally an elliptical copula, with a tail index of about three 
accounts for the main extreme dependence properties investigated here. This 
result is not really surprising since Chap. 5 has shown that Student’s cop- 
ula is a reasonable choice to account for the dependence structure between 
foreign exchange rates. In the present case, since the value of any domestic 
stock index has been converted into the US dollar, the influence of the depen- 
dence structure of foreign exchange rates can be considered as dominant in 
comparison with the dependence structure between each domestic stock index 
expressed in local currency. This dominance of the dependence structure of 
foreign exchange rates seems particularly true during turmoil periods. 

Table 6.4 gives the asymptotic values of pj, p% and p, for v = +oo and 
u — co in order to compare them with the tail-dependence 4. 

These two tables only scratch the surface of the rich sets of measures of 
tail and extreme dependences. We have already stressed that complete inde- 
pendence implies the absence of tail dependence: 4 = 0, but that 4 = 0 does 
not imply independence, at least in the intermediate range, since it is only an 
asymptotic property. Conversely, a nonzero tail dependence A implies the ab- 
sence of asymptotic independence. Nonetheless, it does not imply necessarily 
that the conditional correlation coefficients pt_,, and p$_,, are nonzero, as 
one could have a priori expected. 

Note that the examples of Table 6.4 are such that 4 = 0 seems to go 
hand-in-hand with p7_,,, = 0. However, the logical implication 


(A=0) => (65.00 = 0) 


does not hold in general. A counter example is offered by the Student’s factor 
model in the case where vy > vy, (the tail of the distribution of the idio- 
syncratic noise is fatter than that of the distribution of the factor). In this 
case, X and Y have the same tail-dependence as € and Y, which is zero by 
construction. But, p{_,, and p$_,, are both one because a large Y almost 


Table 6.3. Large v and u dependence of the conditional correlations pf (signed condition), p% (unsigned condition) and p, (on 
both variables) for the different models discussed in this chapter, described in the first column. The numbers in parentheses give the 
equation numbers from which the formulas are derived. The factor model is defined by (6.15), i.e., X = GY +. p is the unconditional 
correlation coefficient 


po po Pu 
Bivariate Gaussian 2 -+ (6.3) sgn(p) - (1 —4 ig +) (6.7) pqs -<y (6.20) 
Bivariate student’s —— (6.13) — (6.14) - 
[er+v—1/ 2 (1p?) (et+pnv = (1p?) 
Gaussian factor model same as (6.3) same as (6.7) same as (6.20) 


Student’s factor model sgn(3)-(1— 5) (6.18) sen(3)-(1— 5) (6.18) - 
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Table 6.4. Asymptotic values of p7, p and p, for v — +00 and u —> oo and comparison with the tail-dependence \ and X for the 
four models indicated in the first column. The factor model is defined by (6.15), i.e., X = aY +. p is the unconditional correlation 
coefficient. For Student’s factor model, Y and ¢€ have centered Student’s distributions with the same number v of degrees of freedom 


and their scale factors are respectively equal to 1 and o, so that p = (1+ oe), For the Bivariate Student’s distribution, we refer 


to Table 1 for the constant values of pwiso and peas 


(eee pias pares » » 

Bivariate Gaussian 0 sgn(p) 0 0 p 
Bivariate student’s see Table 6.3. see Table 6.3 — 2-Tya1 (vo +1,/42) 1 
Gaussian factor model 0 sgn(p) 0 0 p 
Student’s factor model sgn(3) sgn(Z) weer 1,830} 1 


seouenbesuo, pure sisoyyucs G9 
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always gives a large X and the simultaneous occurrence of a large Y and a 
large € can be neglected. The reason for this absence of tail dependence (in the 
sense of A) coming together with asymptotically strong conditional correlation 
coefficients stems from two facts: 


e first, the conditional correlation coefficients put much less weight on the 
extreme tails that the tail-dependence parameter A. In other words, pfi_., 
and p$_., are sensitive to the marginals, i.e., there are determined by the 
full bivariate distribution, while, as we said, \ is a pure copula property 
independent of the marginals. Since pw_,, and p%_,, are measures of ex- 
treme dependence weighted by the specific shapes of the marginals, it is 
natural that they may behave differently. 

e Secondly, the tail dependence \ probes the extreme dependence property 
of the original copula of the random variables X and Y. On the contrary, 
when conditioning on Y, one changes the copula of X and Y, so that the 
extreme dependence properties investigated by the conditional correlations 
are not exactly those of the original copula. This last remark explains 
clearly what Boyer et al. [78] call a “bias” in the conditional correlations. 
Indeed, changing the dependence between two random variables obviously 
changes their correlations. 


There are important consequences to these facts. Consider a situation in 
which one measure (\) would conclude on asymptotic tail-independence while 
the other measures p{_,, and p%_,, would conclude the opposite. Therefore, 
before concluding on a change in the dependence structure with respect to 
a given parameter — the volatility or the trend, for instance — one should 
check that this change does not result from the tool used to probe the depen- 
dence. These results shed new light on recent controversial results about the 
occurrence or absence of contagion during the Latin American crises. As in 
every previous work, the analysis reported in this chapter finds no evidence of 
contagion between Chile and Mexico, but contrarily to [178], it is difficult to 
ignore the possibility of contagion toward Argentina and Brazil, in agreement 
with [87]. 

In fact, most of the discrepancies between these different studies probably 
stem from the fact that the conditional correlation coefficient does not provide 
an accurate tool for probing the potential changes of dependence. Indeed, even 
when the bias has been accounted for, the fat-tailness of the distributions of 
returns are such that the Pearson’s coefficient is subjected to very strong 
statistical fluctuations which forbid an accurate estimation of the correlation. 
Moreover, when studying the dependence properties, it is interesting to free 
oneself from the marginal behavior of each random variable. This is why the 
conditional Spearman’s rho seems a good tool: it only depends on the copula 
and is statistically well-behaved. 

The conditional Spearman’s rho has identified a change in the dependence 
structure during downward trends in Latin American markets, similar to that 
found by Longin and Solnik [315] in their study of the contagion across five 
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major equity markets. It has also put in light the asymmetry in the conta- 
gion effects: Mexico and Chile can be potential sources of contagion toward 
Argentina and Brazil, while the reverse does not seem to hold. This phenom- 
enon has been observed during the 1994 Mexican crisis and appears to remain 
true in the recent Argentinean crisis, for which only Brazil seems to exhibit 
the signature of a possible contagion. 

The origin of the discovered asymmetry may lie in the difference between 
the more market-oriented countries and the more state-intervention oriented 
economies, giving rise to either currency floating regimes adapted to an im- 
portant manufacturing sector which tend to deliver more competitive real 
exchange rates (Chile and Mexico) or to fixed rate pegs (Argentina until the 
2001 crisis and Brazil until the early 1999 crisis) [187, 188, 189]. The asym- 
metry of the contagion is compatible with the view that fixed exchange rates 
tighten more strictly an economy and its stock market to external shocks (case 
of Argentina and Brazil) while a more flexible exchange rate seems to pro- 
vide a cushion allowing a decoupling between the stock market and external 
influences. 

Finally, the absence of contagion does not imply necessarily the absence 
of contamination. Indeed, the study of the coefficient of tail dependence has 
proven that with or without contagion mechanisms (7.e., increase in the link- 
age between markets during crisis) the probability of extreme co-movements 
during the crisis (i.e., the contamination) is almost the same for all pairs of 
markets. Thus, whatever the propagation mechanism may be — historically 
strong relationship or irrational fear and herd behavior — the observed effects 
are the same: the propagation of the crisis. From the practical perspective of 
risk management or regulatory policy, this last point is perhaps more impor- 
tant than the real knowledge of the occurrence or not of contagion. 
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6.A Correlation Coefficient for Gaussian Variables Conditioned 
on Both X and Y Larger Than u 


Let us consider a pair of Normal random variables (X,Y) ~ (0, X’) where X’ 
is their covariance matrix with unconditional correlation coefficient p. Without 
loss of generality, and for simplicity, we shall assume that +’ has unconditional 
variances equal to 1. By definition, the conditional correlation coefficient py, 
conditioned on both X and Y larger than u, is 
= Cov[X,Y |X >u,Y > ul 

J/Var[X | X >u,Y > u],/Var[Y | X >u,Y > ul 
_ mir — Mio + ™Mo1 

Vimo — M492 V'mMo2 — Mor? ” 


where m;; denotes E[X’- YI | X >u,Y > ul. 


Pu ; (6.A.1) 


(6.4.2) 
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Using the proposition A.1 of [15] or the expressions in [252, p.113], we can 
assert that 


mao E(t, p) = (1+ p) ole) F _© (/2.)| (6.A.3) 
rag E(s, 1; p) = (1+ p?) u olu) F _© (24) | 

DA Valued, 6A 
mar E(u, te; p) = 2p u o(u) i _© (2+) | 

+ aoe ~ (y/o) ee Ba apy (6.4.5) 


where L(-,-;-) denotes the bivariate Gaussian survival (or complementary 
cumulative) distribution: 


1a? —2pryt y? 


1 co co 
L(h, k; p) = = / ax | dy ex ( ) , (6.4.6 
(h, ks p) tae Sn b, RD Sa ee (6.4.6) 


(-) is the Gaussian density: 


1 ip? 
v)=——=e 7, 6.A.7 
(x) Pe (6.A.7) 
and &(-) is the cumulative Gaussian distribution: 
P(x) = / du ~p(u) . (6.A.8) 


6.A.1 Asymptotic Behavior of L(u, u; p) 


Let us focus on the asymptotic behavior of L(u,u;), where L(h, k; p) is de- 
fined by (6.4.6), for large u. Performing the change of variables 2’ = x — u 
and y’ = y — u, we can write 


u2 
gece a) oo vty! 
L(u, u; -—— | ax’ [ dy’ ex (-« ) 
ee) In /1— p Jo EN, Soneeg 


1 2 _ 9 / ro 12 
x exp ( 5 — Z ) (6.4.9) 


Using the fact that 

es 1 gl £3 2px" y! + y? - 1 gl? as, 2px" y’ + y? 
2 1-- 2(1 — p?) 

(x’? _ 2Qpa'y! + ay)? (x!? = 2Qpx'y! + y'?)3 
8(1 — p?)? 48(1 —p?)° 


+ 


fee, (6.4.10) 
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and applying Theorem 3.1.1 in [247, p. 68] (Laplace’s method), (6.4.9) and 
(6.A.10) yield 


L(u,u; p) = (1+)? 26 Se 1 (2—p)(1+p) 1 
a QInVJ1l—p? vw? t—5 52 
(2p? -6p+7)1+p)? 1 
(1 — p)? ut 
3h eae cial p> _4o(+)], or 
and 
2 a) 2 - 1 
3—2+p)(1+p)? 1 
a) 
16 — 13 102 — 3p2)(1 3 4 1 
ak pete 5e')( +p) 5+ 0( =) (6.4.12) 


6.A.2 Asymptotic Behavior of the First Moment mo 


The first moment mip = E[X | X > u,Y > ul is given by (6.A.3). For large 


1-@ (V2) = ; erfe (=) (6.A.13) 


7 Ttpe xa ‘ Lee a 4 Ds. 
a Lp: a 


l-p V2ru Leap. «ue 
Lap\ a 1 
A14 
6 (778) 5+0(%)]. 6 


so that multiplying by (1 + p) ¢(u), we obtain 


u2 
(1+ p)? e te 1l+p 1 
L = 1 . 
™10 (u, Uy p) = 2 on u {ih p U2 
Layo a TaN A 1 
3 . 15 : O , 6.A.15 
a (==) ut (2 ie” us ( ) 


Using the result given by equation (6.A.11), we can conclude that 


1 1 2(2 — 1 

my = ut (+0) ( 22 p) = 
2 3 

se +0 7) (6.4.16) 


264 6 Measuring Extreme Dependences 


In the sequel, we will also need the behavior of m19?: 


1 2(3 — 1 


(Cee) aes 
4562 eee +p) a iss (=) (6.4.17) 


6.A.3 Asymptotic Behavior of the Second Moment mao 


The second moment mg) = E[X? | X > u,Y > ul] is given by expression 
(6.4.4). The first term in the right hand side of (6.A.4) yields 


uz 
1l-—p l+p e ite 
1 +p" 1-@ S(t), / 2 
(1+ 0%) wotw) [1-0 (ft Pu)] = 0+ 0/22 S x 
Lipp iepy it L+p\?. 1 1 
(ieee 1 teed 
— 3(7*2) ut ? l—p ue _ u& 


(6.4.18) 


5. | a 


while the second term gives 


pV1— p? / 2 et 
= 1 — p? : 6.A.19 
V2 of it+p- e p 20 ( ) 


Putting these two expressions together and factorizing the term (1+p)/(1+p7) 
gives 


u2 
apr: ae E 1+e? 1 gt e%)(l+e) 1 


mo L(u, u; p) = Ji-p on [0° ae i=p2 ae 
i ae p) a +0 (=) + L(u,u;p), (6.4.20) 
which finally yields 
moo = u? +2 (1+) - 204 = 
| pier Pt +p)? a («) . (6.A.21) 


6.A.4 Asymptotic Behavior of the Cross Moment m1 


The cross moment my, = E[X-Y | X > u,Y > ul] is given by expression 
(6.4.5). The first and second terms in the right-hand side of (6.A.5) respec- 
tively give 
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J1— 2 ) Pa 
Ne at =/1— p2 ; 6.A.23 
V20 a( ray ay ( ) 


which, after factorization by (1+ p)/p, yields 


2 
(leer © oe c ae eee. p(it+p) 1 


L 2 = i 
mi L(u, u; p) T— p2 pe oon l—-p w (1—p)? u4 
p(lt+p)? 1 1 
: = L : .A.24 
30 ‘a — py iG +0 us +p (u, u; p), (6 ) 
and finally 


my = u? +2 (1+ p) (1+ p)?(3—p) | 1 


(=9. 
(16 — 9p + 3p?)(1+ p)? C3 1 
ms aoe = +0 (<5). (6.A.25) 


6.A.5 Asymptotic Behavior of the Correlation Coefficient 


The conditional correlation coefficient conditioned on both X and Y larger 
than wu is defined by (6.4.2). Using the symmetry between X and Y, we have 
M19 = Mp1 and M9 = Mo2, which allows us to rewrite (6.A.2) as follows: 


2 
™11 — ™10 


Pu = oe (6.A.26) 
m20 — ™10 
Putting together the previous results, we have 
1 2 4 — ¢ 3)(1 ath 1 
hap epee Oe 5 Ee OO lr FORO) EG 
2 dsexi us us 
(6.4.27) 
1+p)> 1 1 
my ~ myo? = p p) mn ( =) (6.4.28) 
—p Ww u 


which proves that 


l+p 1 
l-p w 


1 
Pu =p +O (=) and p € [—1,1). (6.4.29) 
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6.B Conditional Correlation Coefficient for Student’s Variables 
6.B.1 Proposition 


Let us consider a pair of Student’s random variables (X,Y) with v > 2 degrees 
of freedom and unconditional correlation coefficient p. Let A be a subset of R 
such that Pr{Y € A} > 0. The correlation coefficient of (X,Y), conditioned 
on Y € A defined by 


Cov(X,Y | Y € A) 


= 6.B.30 
PA" WalX |Y € A) /VarlY |Y € A) ( ) 
can be expressed as 
p 
= .B.31 
PA 2, BlE(@? | Y)—p?¥2 | YEA] ’ (O.Bal) 
por Var(Y | YEA) 
with 
Pri /ssY eA | vy —2} 
y—-1l v—2 
Var(Y | Y € A) =v me PHY € Alo} 
fica ¥ty(y)]” 
JyEA "by 
PHYeALH | (6.B.32) 
where t,,(y) is given below by (6.B.36) and 
Prd, /4YeAl|v-2 
E[E(X? | ¥)—p?¥? | ¥ € A] = (1—p?) 2 ty : I (6.B.33) 


Ta a Pr{Y € A | v} 


6.B.2 Proof of the Proposition 


Let the variables X and Y have a multivariate Student distribution with v > 2 
degrees of freedom and a correlation coefficient p : 
v+2 
Tr (4) ( ere 
Pxy(a,y) = 2 , (6.B.34 
OS ee (Ee v (1— p*) were 
ee ad ty(y) +t y+"? o-py (6.B.35) 
y+y? Tape VAY v+1 y+y? Te ) Db. 


where t,(-) denotes the univariate Student density with v degrees of freedom 


t,(a) = Sia) = (6.B.36) 
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Let us evaluate Cov(X,Y | Y € A): 

Cov(X,Y | Y © A)=E(X -Y | Y € A)-E(X |Y € A)-E(Y | Y € A) 
=E(E(X | Y)-Y | Y € A)-—E(E(X | Y) | Y € A)- E(Y | Y € A). (6.B.37) 
As it can be seen in equation (6.B.35), E(X | Y) = pY, which gives 

Cov(X,Y | Y € A)=p-E(Y? | Y €A)—p-E(Y | Y € A)’, (6.B.38) 


=p-Var(Y |Y €A). (6.B.39) 
Thus, we have 
_ Var(Y | Y € A) 
cal pea (Yea): Sa. 


Using the same method as for the calculation of Cov(X,Y | Y € A), we 
find 
Var(X | Y € A) = E[E(X? | Y) | Y € A)] —E[E(X | Y) | Y € A)’, 
[E(X? | Y)|Y¢€A)]-p?-E[Y | Y € A}?, (6.B.41) 
[E(X? | Y) — p’?¥? | Y € A)] — p?- Var[Y | Y € A], 
which yields (6.B.31). 

To go one step further, we have to evaluate the three terms E(Y | Y € A), 
E(Y? | Y € A), and E[E(X? | Y) | Y € Al. 

The first one is trivial to calculate : 


E 
E 


Sea dy y- ty(y) 


EY |¥ €A)= Say (6.B.42) 
The second one gives 
Syea Wy y? * ty(y) 
E(y? Y = yeA .B.4 
ae] a) Pr{YEA]v}’ eB e8) 
4 Pil (SY eAly=2 
= ppm (yr 1| , (6.B.44) 
y—2 Pr{Y €A|v} 
so that 
Sateen. v1 Pr{ By eAly-2} 
ae ee U =o Pr{Y €A| v} 
Srcaly ¥-tyly)]” 
YEA “Ys 
aed (6.B.45) 


To calculate the third term, we first need to evaluate E(X? | Y). Using 
equation (6.B.35) and the results given in [1], we find 
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+1 2 4? y+l1 a x— py 
E(X2 | Y = fae “ ) eo ( ) 
(| ¥)= far (FFG) En |(F4) Sa 
vty? 
ag eR er (6.B.46) 


which yields 


Vv 


BIE(X? | ¥)—p?Y? |Y € A] = 


_— p72 
(1 py E[Y? | Y € A], (6.B.47) 


and applying the result given in equation (6.B.44), we finally obtain 


: Pr{ (Fay € Al v—2} 
B[E(X? | Y)—p’¥? |Y e€ AJ =(1 Ce era PHY €A]y} , (6.B.48) 


which concludes the proof. 


6.B.3 Conditioning on Y Larger Than v 


The conditioning set is A = [v,+0o), thus 


Pr{Y¥ €A| v}=T,(v)=0's +0 cor) (6.B.49) 


Pr{/ Sed |p rh =t-r( | 
v—p V 


a SE Be. tO (v-@-P4)) , (6.B.50) 


Vv y—2 
-t = ,/ —— t,_ 
[ay y(y) ame, »( 7 : 


__vi_ W219 (v-@-9)} ’ (6.B.51) 


y—2 yy-l 


where t,,(-) and T,,(-) denote respectively the density and the Student survival 
distribution with v degrees of freedom and C,, is defined in (6.B.36). 

Using equation (6.B.31), one can thus give the exact expression of pr. 
Since it is very cumbersome, we will not write it explicitly. We will only give 
the asymptotic expression of p/: 


Var(Y | Y € A) = nee v+O(1)  (6.B.52) 


Se v?+O(1). (6.B.53) 


E[E(X? | Y)— p?¥2 | YeAl= 
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Thus, for large v, 


(6.B.54) 


6.B.4 Conditioning on |Y| Larger Than v 


The conditioning set is now A = (—o0, —v]U |v, +00), with v € Ry. Thus, the 
right-hand sides of equations (6.B.49) and (6.B.50) have to be multiplied by 
two while 


[avy tn =o, (6.B.55) 
YEA 


for symmetry reasons. So, equation (6.B.53) still holds while 


Var(Y |Y €.A) = a5 w+ O(1). (6.B.56) 


Thus, for large v, 


ps a (6.B.57) 


[P+ ony a) 


6.B.5 Conditioning on Y > v Versus on |Y| > v 


The results (6.B.54) and (6.B.57) are valid for v > 2, as one can expect since 
the second moment must exist for the correlation coefficient to be defined. 
Contrarily to the Gaussian case, the conditioning set is not really important. 
Indeed with both conditioning set, p,/ and p% go to constants different from 
zero and (plus or minus) one, when v goes to infinity. This striking difference 
with the Gaussian case can be explained by the large fluctuations allowed by 
the Student’s distribution, and can be related to the fact that the coefficient of 
tail dependence for this distribution does not vanish even though the variables 
are anticorrelated (see Sect. 4.5.3). 

Contrarily to the Gaussian distribution which binds the fluctuations of the 
variables near the origin, the Student’s distribution allows for “wild” fluctu- 
ations. These properties are thus responsible for the result that, contrarily to 
the Gaussian case for which the conditional correlation coefficient goes to zero 
when conditioned on large signed values and goes to one when conditioned on 
large unsigned values, the conditional correlation coefficient for Student’s vari- 
ables have a similar behavior in both cases. Intuitively, the large fluctuations 
of X for large v dominate and control the asymptotic dependence. 
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6.C Conditional Spearman’s Rho 


To obtain the conditional Spearman’s rho defined in (6.23), we need a few 
intermediate calculations. We have 


dC( 
E[-|V >a = Se fo a Lf f 4 v). (6.0.58) 
he iis dC(u, v) 1-v 
Performing a simple integration by parts, we obtain 
1 1 
EU |V>d = / du C(u, 0) — ‘ (6.0.59) 
1 =U 0 2 
i te 
EV|V>a = — (6.C.60) 
2 ‘ 1 
B[V? |V>a]) =1+ ;| au u Cut) ~ 5] (6.C.61) 
1 a) 0 3 
24541 
EV? |vV>oq = —., (6.C.62) 
1 
E[U-V|V>a= a 4 Alf wf du C(u,v) 
a; i. Ou a= | (6.C.63) 
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which yields 
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ae er, eens Or ae i 
Saal u C(u, 0) (iP (/ w C(w,3)) ’ 
1— 9)? 
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Summary and Outlook 


7.1 Synthesis 


A common theme underlying the chapters of this book is that many impor- 
tant applications of risk management rely on the assessment of the positive 
or negative outcomes of uncertain positions. The probability theory and sta- 
tistics, together with the valuation of losses incurred for a given exposition to 
various risk factors, take a predominant place in this process. However, they 
are not, by far, the sole ingredients needed in an efficient risk management 
system. Quoting Andrew Lo [311], one can assert that 


[Although most] current risk-management practices are based on prob- 
abilities of extreme dollar losses (e.g., measures like Value-at-Risk), 
[...] these measures capture only part of the story. Any complete risk 
management system must address two other important factors: prices 
and preferences. Together with probabilities, these comprise the three 
P’s of Total Risk Management. [Understanding] how the three P’s in- 
teract [allows] to determine sensible risk profiles for corporations and 
for individuals, guidelines for how much risk to bear and how much 
to hedge. By synthesizing existing research in economics, psychology, 
and decision sciences, and through an ambitious research agenda to 
extend this synthesis into other disciplines, a complete and systematic 
approach to rational decision making in an uncertain world is within 
reach. 


Among the three P’s, Probability constitutes today, in our opinion, the 
most solid pillar of risk management, because it has reached the highest level 
of maturation. Compared with Price and Preference, the Probability theory is 
clearly the most developed in terms of its mathematical formulation, providing 
important and accurate quantitative results. 

Asset valuation — and therefore Price assessment — is also very developed 
quantitatively, but it remains, for a large part, subordinated to the quality of 
the estimation of the probabilities. Indeed, a cornerstone of modern finance 
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theory holds that the (fair) value of a given investment vehicle is nothing 
but the mathematical expectation — under a suitable probability measure — 
of the future discounted cash-flows generated by this investment vehicle. The 
assessment of future cash-flows for complex investment vehicles is nothing 
but an exercise of pure financial analysis but, without a correct probability 
measure, this exercise has little value. This indubitably shows that Prices and 
Probabilities are inextricably entangled and that an accurate price assessment 
requires an accurate determination of the probabilities. 

Preferences are also of crucial importance — in fact the most important of 
the three P’s, according to Lo — since under this term is embodied the entire 
human decision making process. But here, in contrast to the two other P’s, 
our knowledge is still in its infancy. The pioneering theoretical work by Von 
Neuman and Morgenstern [482] has laid the foundations of a rational decision 
theory. However, this theory has been undermined over the years by several 
paradoxes and deficiencies [4, 5], when tested against real human preferences. 
Most of the recent theories, notably those directly inspired by psychological 
studies [204, 263, 352, and references therein], attempt to cure the original 
rational decision theory from its inconsistencies. But, one should recognize 
that, while significant qualitative progress has been obtained, there is not yet 
a satisfying fully operational theory of decision making. 

For all these reasons, Probability still plays a dominant role in current 
risk management practice. And we firmly believe that this supremacy will 
extend well into the future, in view of the still large remaining potential for 
improvement. Of course, the modern science of human psychology and decision 
making is in constant progression and accounts better and better for the many 
anomalies observed on financial markets. However, its fusion with finance, 
which has given birth to the field of “behavioral finance,” will provide useful 
practical tools only with the development of accurate quantitative predictions. 
Until then, behavioral finance will continue to be mostly the playground of 
academic research. We thus believe that, in the next few years, the most 
important improvements in applied risk management will occur through more 
elaborate modeling of financial markets and more generally of the economic 
environment. 

In spite of the key role of Price and Preference, this book has mainly 
focused on the role of Probability in the risk assessment and management 
processes. The different probabilistic concepts presented in the core chapters 
of this book should provide a better understanding and modeling of the various 
sources of uncertainty and therefore of risk factors that investors are facing. 
Our presentation has been organized around the key idea that the risk of a 
set of positions can be decomposed into two major components: 


(i) the marginal risks associated with the variations of wealth of each risky 
position, 
(it) the cross-dependence between the change in the wealth of each position. 
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This decomposition has been justified in Chap. 3 by the introduction of 
the notion of copula, which is the pivot of the book. 

With respect to the marginal risks, Chap. 2 has highlighted the weaknesses 
of traditional methods used to assess large downside risks, which are generally 
based upon tools derived from the extreme value theory. As an alternative, we 
have advocated using comprehensive parametric distributions which provide a 
good compromise between weak model errors — inherent to any parametric risk 
measurement — and accurate risk estimates. Accounting for model error is of 
vital importance for risk management purposes. This aspect is often forgotten, 
but it really plays a prominent role in the risk assessment process. Consider 
for instance the VaR estimates obtained under the Gaussian hypothesis. We 
have seen that, for large confidence levels, they have only little value and, yet, 
this class of models is still promoted by regulating institutions such as the 
Bank for International Settlements [42]. 

Concerning the cross-dependence between assets, we have emphasized that 
copulas are the most fundamental concept and tool. They should therefore 
constitute a cornerstone of modern risk management practices. Among many 
properties, this results in particular from the unique and optimal separation 
between individual risks and collective dependence that they provide. As a 
consequence, copulas have been shown to exhibit an unsurpassed flexibility 
and versatility for the elaboration of scenarios. This comes, however, at the 
cost of the preliminary calibration of the best copula, a problem which has not 
yet received a fully satisfying answer, as reviewed in Chap. 5. Nevertheless, 
the few families of copulas, which have been surveyed in Chap. 3, appear to 
be reasonable candidates for modeling the dependence structures of arbitrary 
baskets of financial assets and therefore allow for a relatively easy and useful 
generation of case studies. 

Chapters 5 and 6 have unearthed an important and somewhat surprising 
difference in the dependence between currencies and between stocks. Overall, 
the analyses which have been presented find a much weaker dependence be- 
tween stocks than between currencies. This is reflected quantitatively by the 
fact that the dependence between currencies can be described by Student’s 
copulas with a low number of degrees of freedom (typically 4 to 6). In contrast, 
the dependence between stock returns require a larger number of degrees of 
freedom (10 or more). This observation raises important questions for interna- 
tionally diversified portfolios. Indeed, if the dependence between the returns 
of foreign exchange rates is much stronger than the dependence between stock 
returns as reported in Chap. 5, the benefits of international diversification can 
vanish. It is true that various national stock markets exhibit anticorrelations 
between some stocks and at some epochs, which a priori would justify holding 
stocks from different national markets. However, if the corresponding curren- 
cies are positively associated with strong tail dependences, as in the Latin 
American markets (see Chap. 6), the diversification effect mostly disappears 
once the gains or losses of the stocks are translated into the same monetary 
unit. In particular, once expressed in the domestic currency of the holder of 
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an internationally allocated portfolio, the diversification effect seems to dis- 
appear at times of turmoil, that is, exactly when the investor needs it the 
most. 

Chapter 6 has examined in depth this question of possible changes of de- 
pendence during financial crises. Two possible explanations have been consid- 
ered: (i) contamination operating via the sensitivity of dependence measures 
to the changing volatility level, or (ii) contagion reflecting a genuine change 
of dependence. The later case leads to inefficient allocations when neglected. 
However, the most important conclusion for risk management is not so much 
the distinction between contamination or contagion but the presence of a 
strong tail dependence that may exist between national markets, because it 
destroys the benefits of diversification across countries when one market goes 
in crisis. 


7.2 Outlook and Future Directions 


Our exposition has mainly focused on the concept of cross-sectional depen- 
dence between several random variables, but there are many aspects of the 
question, such as time dependencies, which have been only barely touched 
or which have been actually neglected. It may be useful to discuss them so 
as to provide a better appreciation of the limits of the methods proposed in 
this book and consequently of their domains of application. It is also useful 
to delineate possible future exciting directions for future improvements of the 
risk management practice. 


7.2.1 Robust and Adaptive Estimation of Dependences 


A major concern, especially for practitioners, is whether this whole math- 
ematical edifice, its algorithmic implementations and its rigorous statistical 
tests are relevant and useful for the really important risks, such as global 
market moves and crashes. It is indeed a common experience that the depen- 
dence estimated and predicted by standard models change dramatically at 
certain times, not only during crashes, but also when the market exhibits a 
collective downward plunge. A quite common observation is that investment 
strategies, which have some moderate (coefficient of regression to the mar- 
ket) for normal times, can see their @ jumps to a much larger value (close to 1 
or larger depending on the leverage of the investment) at certain times when 
the market collectively dives. However, investments which are thought to be 
hedged against negative global market trends may actually lose as much or 
more than the global market, at certain times when a large majority of stocks 
plunge simultaneously. 

This question has been touched upon in Chap. 4 when discussing the 
possible strategies for preventing the large downward moves of a portfolio, 
based upon its tail dependence with the market. We will also revisit this 
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problem in the context of the occurrence of “outliers” and of time-varying 
dependence. But it contains two other components: (1) Is the estimation of 
the dependence meaningful and really robust? (2) If not, what can be done? 

The first question deals with the development of robust estimation tech- 
niques, defined as methods which are insensitive to small departures from the 
idealized assumptions which have been used to optimize the algorithm. Such 
techniques include M-estimates (which follow from maximum likelihood con- 
siderations), L-estimates (which are linear combinations of order statistics), 
and R-estimates (based on statistical rank tests) [238, 479, 484]. 

The second question requires novel approaches. A possible one is inspired 
from Herbert A. Simon, the famous economist and cognitive scientist studying 
how people make real-world decisions, who observed that they seldom opti- 
mize. “Rather people seek strategies that will work well enough, that include 
hedges against various potential outcomes and that are adaptive. Tomorrow 
will bring information unavailable today; therefore, people plan on revising 
their plans” summarize Popper et al. [391]. In this spirit, people have de- 
veloped an approach to look not for optimal strategies but for robust ones, 
defined as strategies which perform well when compared with the alterna- 
tives across a wide range of plausible futures. “It need not be the optimal 
strategy in any future; it will, however, yield satisfactory outcomes in both 
easy-to-envision futures and hard-to-anticipate contingencies. This approach 
replicates the way people often reason about complicated and uncertain de- 
cisions in everyday life” says Popper et al. The process of decision-making 
under conditions of deep uncertainty requires first to consider ensembles of 
scenarios, then to seek robust and adaptive strategies, and finally to combine 
machine and human capabilities interactively. Outstanding questions involve 
the compromise between near-term objectives and long-term sustainability, 
and the characterization of irreducible risks and of “surprises” [301]. 

In the same vein, the approach in terms of universal portfolios initiated by 
Cover [113] a decade ago has opened the way to many studies [63, 227, 261]. 
Assuming for instance that one invests in a constantly rebalanced strategy, 
the question amounts to determining the best weights (the fraction of wealth 
invested in each stock) of this strategy for an investment horizon T. Since 
the optimal weights can only be assessed ex-ante — once the time T has been 
reached — it seems impossible to design ez-ante a strategy whose performance 
will compare with the performance of the best strategy with hindsight. How- 
ever, it turns out that the universal approach promoted by Cover circumvents 
this problem. It consists in investing uniformly in all constantly rebalanced 
portfolio strategies. This results in a strategy that is nearly optimal in the 
sense that, for any sequence of stock market outcomes, this particular invest- 
ment strategy has a performance comparable to the best constantly rebalanced 
portfolio in the long run. More generally, any universal strategy is such that 
its average logarithmic performance over horizon JT’ approaches the best ez- 
post average logarithmic performance over the same horizon T’, in the limit of 
long horizon T, irrespective of the market price sequence from now to time T. 
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7.2.2 Outliers, Kings, Black Swans and Their Dependence 


In its conclusion, Chap. 2 notes the existence of “outliers” (also called “kings” 
or “black swans”), in the distribution of financial risks measured at variable 
time scales such as with drawdowns. These outliers are identified only with 
metrics adapted to take into account transient increases of the time depen- 
dence in the time series of returns of individual financial assets [249] (see also 
Chap. 3 of [450]). These outliers seem to belong to a statistical population 
which is different from the bulk of the distribution and require some additional 
amplification mechanisms active only at special times. 

Chapter 5 shows that two exceptional events in the period from January 
1989 to December 1998 stand out in statistical tests determining the relevance 
of the Gaussian copula to describe the dependence between the German Mark 
and the Swiss Franc. The first of the two events is the coup against Gorbachev 
in Moscow on 19 August, 1991 for which the German mark (respectively the 
Swiss Franc) lost 3.37% (respectively 0.74%) against the US dollar. The second 
event occurred on 10 September, 1997, and corresponds to an appreciation of 
the German Mark of 0.60% against the US dollar while the Swiss Franc lost 
0.79% which represents a moderate move for each currency, but a large joint 
move. 

The presence of such outliers both in marginal distributions and in con- 
comitant moves, together with the strong impact of crises and of crashes, 
suggests the need for novel measures of dependence between drawdowns and 
other time-varying metrics across different assets. This program is part of 
the more general need for a joint multi-time-scale and multi-asset approach 
to dependence. Examples of efforts in this direction include multidimensional 
GARCH models [23, 45, 46, 154, 296, 400, 477] and the multivariate multi- 
fractal random walk [366]. It also epitomizes the need for new multi-period 
risk measures, which would account for this class of events. Several avenues 
of research have recently been opened by attempting to generalize the no- 
tions of Value-at-Risk and of coherent measures of risk within a multi-period 
framework [21, 405, 483). 


7.2.3 Endogeneity Versus Exogeneity 


The presence of outliers such as those mentioned in the previous section poses 
the problem of exogeneity versus endogeneity. An event identified as anom- 
alous could perhaps be cataloged as resulting from exogenous influences.! 
The same issue has been investigated in Chap. 6 when testing for contagion 
versus contamination in the Latin American crises. Contamination refers to 
an endogenous dependence described by an approximately constant copula. 
In contrast, contagion is by definition the concept that the dependence has 


' However, outliers may also have an endogenous origin, as described for financial 
crashes [250, 449, 450]. 
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changed, either transiently or with lasting effects, due to some influence or 
mechanism which is exogenous (in a sense that needs to be defined precisely) 
to the previous regime. 

The concept of exogeneity? is fundamental in empirical econometric mod- 
eling and statistical estimation (see for instance [152, 156]). Here, we refer 
to the question of exogeneity versus endogeneity in the broader context of 
self-organized criticality? [32, 231, 246, 451], inspired in particular from the 
physical and natural sciences. According to self-organized criticality, extreme 
events are seen to be endogenous, in contrast with previous prevailing views 
(see for instance the discussion in [33, 448]). But, how can one assert with 
100% confidence that a given extreme event is really due to an endogenous 
self-organization of the system, rather than to the response to an external 
shock? Most natural and social systems are indeed continuously subjected 
to external stimulations, noises, shocks, solicitations, and forcing, which can 
widely vary in amplitude. It is thus not clear a priori if a given large event 
is due to a strong exogenous shock, to the internal dynamics of the system, 
or maybe to a combination of both. Addressing this question is fundamental 
for understanding the relative importance of self-organization versus external 
forcing in complex systems and underpins much of the problem of dependence 
between variables. 

The question, whether distinguishing properties characterize endogenous 
versus exogenous shocks, permeates many systems, for instance, biological 
extinctions such as the Cretaceous/Tertiary KT boundary (meteorite versus 
extreme volcanic activity versus self-organized critical extinction cascades), 
commercial successes (progressive reputation cascade versus the result of a 
well-orchestrated advertisement), immune system deficiencies (external vi- 
ral/bacterial infections versus internal cascades of regulatory breakdowns), 
the aviation industry recession (9/11 versus structural endogenous problems), 
discoveries (serendipity versus the outcome of slow endogenous maturation 
processes), cognition and brain learning processes (role of external inputs ver- 
sus internal self-organization and reinforcements) and recovery after wars (in- 
ternally generated — i.e., civil wars — versus imported from the outside) and 
so on. In economics, endogeneity versus exogeneity has been hotly debated 
for decades. A prominent example is the theory of Schumpeter [432] on the 
importance of technological discontinuities in economic history. Schumpeter 
argued that “evolution is lopsided, discontinuous, disharmonious by nature... 
studded with violent outbursts and catastrophes. ..more like a series of ex- 
plosions than a gentle, though incessant, transformation”. Endogeneity versus 
exogeneity is also paramount in economic growth theory [415]. 

Several evidences of quantitative signatures distinguishing exogenous from 
endogenous shocks have recently been described. Concerning the way the 


? In a nutshell, conditioning on an exogenous variable does not decrease the amount 
of information in parameter estimation [152]. 
3 Self-organized criticality is part of the theory of complex systems. 


278 7 Summary and Outlook 


continuous stream of news gets incorporated into market prices for instance, 
it has recently been shown how one can distinguish the effects of events like 
the 11 September, 2001 attack or the coup against Gorbachev on 19 August, 
1991 from events like financial crashes such as October, 1987 as well as smaller 
volatility bursts. Based on a stochastic volatility model with long range de- 
pendence (the so-called “multifractal random walk”, whose main properties 
are given in Appendix 2.A), Sornette et al. [456] have predicted different re- 
sponse functions of the volatility to large external shocks compared with what 
we term endogenous shocks, i.e., which result from the cooperative accumu- 
lation of many small news. This theory, which has been successfully tested 
against empirical data with no adjustable parameters, suggests a general clas- 
sification into two classes of events (endogenous and exogenous) with specific 
signatures and characteristic precursors for the endogenous class. It also pro- 
poses a simple origin for endogenous shocks as the accumulations, in certain 
circumstances, of tiny bad news that add coherently due to their persistence. 

Another example supporting the existence of specific signatures distin- 
guishing endogenous and exogenous events has been provided by a recent in- 
vestigation concerning the origin of the success of best sellers [128, 455]. The 
question is whether the latest best seller is simply the product of a clever mar- 
keting campaign or if it has truly permeated society? In other words, can one 
determine whether a book’s popularity will wane as quickly as it appeared 
or will it become a classic for future generations? The study in [455, 128] 
describes a simple and generic method that distinguishes exogenous shocks 
(e.g., very large news impact) from endogenous shocks (e.g., book that be- 
comes a best seller by word of mouth) within the network of online buyers. 
An endogenous shock appears slowly but results in a long-lived growth and 
decay of sales due to small but very extensive interactions in the network of 
buyers. In contrast, while an exogenous shock appears suddenly and propels 
a book to best seller status, these sales typically decline rapidly as a power 
law with exponent larger than for endogenous shocks. These results suggest 
that the network of human acquaintances is close to “critical,” with informa- 
tion neither propagating nor disappearing but spreading marginally between 
people. These results have interesting potential for marketing agencies, which 
could measure and maximize the impact of their publicity on the network of 
potential buyers, for instance. 

These two examples show that the concepts of endogeneity and exogene- 
ity should have many applications including the modeling and prediction of 
financial crashes [250, 458], Initial Public Offerings (IPO) [245], the movie 
industry [119] and many other domains related to marketing [452], for which 
the mechanism of information cascade derives from the fact that agents can 
observe box office revenues and communicate word of mouth about the quality 
of the movies they have seen. The formulation of a comprehensive theory of 
(time) dependence allowing to characterize endoneity and exogeneity and to 
distinguish between them is thus of great importance in future developments. 
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7.2.4 Nonstationarity and Regime Switching in Dependence 


The general problem of the application of mathematical statistics to nonsta- 
tionary data (including nonstationary time series) is very important, but alas, 
not much can be done. There are only a few approaches which may be used 
and only in specific conditions: 


e Use of algorithms and methods which are robust with respect to possible 
nonstationarity in data, such as normalization procedures or the use of 
quantile samples instead of initial samples. 

e Model nonstationarity by some low-frequency random processes, such as, 
e.g., a narrow-band random process X(t) = A(t) cos(wt + (t)) where 
w <1 and A(t) and phase ¢(t) are slowly varying amplitude and phase. 
In this case, the Hilbert transform can be very useful to characterize ¢(t) 
nonparametrically [79, 379]. 

e The estimation of the parameters of a low-frequency process based on a 
“short” realization is often hopeless. In this case, the only quantity which 
can be evaluated is the uncertainty (or scatter) of the results due to the 
nonstationarity. 


Regime switching popularized by Hamilton [221] for autoregressive time 
series models is a special case of nonstationary, which can be handled with 
specific methods. Regime switching has been extensively used in business cycle 
analysis in order to describe the economic fluctuations in a rigorous statistical 
framework. The key idea is that the parameters of a model may switch between 
two (or more) regimes, where the switching is governed by a time-dependent 
state variable S; which takes typically two values 0 or 1. When S; = 0, the 
parameters of the model are different from those when S$; = 1. Clearly, if S; 
were an observed variable, the parameters could simply be estimated using 
dummy variable methods. Regime-switching methods rely on the observation 
by Hamilton that, even when the state is unobservable, the parameters of the 
model in each state can be estimated provided that restrictions are placed 
on the probability process governing S;. The simplest such restriction is to 
assume that S; obeys the dynamics of a first-order Markov chain, which means 
that any persistence in the state is completely embodied in the value of the 
state in the previous period. Many generalizations are under development. 

In recent studies, many workers have extended this idea to model regime 
switches in the dependence structure of financial assets. We refer to [382, 498] 
for a perspective of recent efforts and references therein. Developing these 
ideas in the framework of copulas is a promising avenue for future research. 

Regime switching is also appealing from a microeconomic viewpoint as it 
may reflect the changing conventions used by investors. Conventions can be 
formed by the belief of agents on the existence of correlations between infor- 
mation and returns for instance. Following this belief, agents try to estimate 
this correlation from past time series and act on it, thus creating it [494]. An- 
other mechanism for conventions is based on imitation and moods [460]. Both 
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mechanisms predict the existence of random abrupt changes. For the future, 
it would be interesting to combine both mechanisms as they are arguably 
present together in real markets, in order to clarify their relative importance 
and interplay. Another important field of research is to combine these micro- 
economic models with the tools developed to detect regime switching. 


7.2.5 Time-Varying Lagged Dependence 


Determining the arrow of causality between two time series X(t) and Y(t) has 
a long history, especially in economics, econometrics and finance and it is often 
asked which economic variable might influence other economic phenomena 
(93, 199]. This question is raised in particular for the relationships between 
respectively inflation and GDP, inflation and growth rate, interest rate and 
stock market returns, exchange rate and stock prices, bond yields and stock 
prices, returns and volatility [95], advertising and consumption and so on. One 
simple naive measure is the lagged cross-correlation function 


_ Cov [X(@)Y(t+7)] 
Cx,y(T) = Var[X]Var[Y] 


Then, a maximum of Cx,y(7) at some nonzero positive time lag 7 implies 
that the knowledge of X at time t gives some information on the future real- 
ization of Y at the later time t + 7. However, such correlations do not imply 
necessarily causality in a strict sense as a correlation may be mediated by a 
common source influencing the two time series at different times. The concept 
of Granger causality bypasses this problem by taking a pragmatic approach 
based on predictability: if the knowledge of X(t) and of its past values im- 
proves the prediction of Y(t + 7) for some + > 0, then it is said that X 
Granger causes Y [22, 199] (see [98] for a recent extension to nonlinear time 
series). Such a definition does not address the fundamental philosophical and 
epistemological question of the real causality links between X and Y but has 
been found useful in practice. 

However, most economic and financial time series are not strictly sta- 
tionary and the lagged correlation/dependence and/or causality between two 
time series may be changing as a function time, for instance reflecting regime 
switches and/or changing agent expectations. It is thus important to define 
tests of causality or of lagged dependence which are sufficiently reactive to such 
regime switches, allowing to follow almost in real time the evolving structure 
of the causality. Cross-correlation methods and Granger causality tests require 
rather substantial amount of data in order to obtain reliable conclusions. In 
addition, cross-correlation techniques are fundamentally linear measures of de- 
pendence and may miss important nonlinear dependence properties. Granger 
causality tests are most often formulated using linear parametric autoregres- 
sive models. It may thus be that many of the paradoxes in macroeconomics 
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concerning the causal relationship between such variables as inflation, infla- 
tion change, GDP growth rate and unemployment rate could result from an 
inadequate description of the time-varying lag cross-sectional dependence. 

Recently, a new method, called “Optimal thermal causal path”, has been 
introduced [459]. It is both nonparametric and sufficiently general so as to 
detect a priori arbitrary nonlinear dependence structures. Moreover, it is 
specifically conceived so as to adapt to the time evolution of the causality 
structure. The “Optimal thermal causal path” can be viewed as an extension 
of the “time distance” measure which amounts to compare trend lines upon 
horizontal differences of two time series [212]. 

The development of generalized dependence measures using such time- 
adaptive lag structure seems to be another promising domain of future devel- 
opments. 


7.2.6 Toward a Dynamical Microfoundation of Dependences 


The need for the rather sophisticated statistical methods described in this 
book, as well as the developments suggested in this concluding chapter, reflect 
in our opinion the absence of a fundamental genuine economic understand- 
ing. To make a comparison with Natural Sciences, the need of such statistical 
methods has been less important, probably because most of the fundamental 
equations are known (at least at the macroscopic level) and the challenge lies 
more in understanding the emergence of complex solutions from seemingly 
simple mathematical formulations. In physics, for instance, the issues of de- 
pendences raised in this book are better and more simply attacked from a 
study of the fundamental dynamical equations. In contrast, we lack a deep 
underpinning for understanding the mechanisms at the origin of the dynam- 
ical behavior of financial markets. It is thus possible that the emerging field 
of behavioral finance, with its sister fields of neuroeconomics and evolution- 
ary psychology, and their exploration of the impact on decision making of 
imperfect bounded subjective probability perceptions [36, 206, 437, 439, 474], 
may provide a fundamental shift in our understanding and therefore in the 
formulation of dependence between assets. This will have major impacts on 
risk assessment and its optimization. 
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dependence 
Coherent measures of risk 4,276 
Comonotonicity 101,102, 107,149, 
155, 160 
Complete market 133, 136 
Complete monotonicity 111 
Concordance measure 154-162, 165 
Conditional correlation coefficient 233 
Consistent measures of risk 7 
Contagion XI, 231, 260 
Contingent claim see Option 
Convex measure of risk 7 
Copula_ X, 34, 35, 103, 273 
Archimedean see Archimedean 
copula 
dual 104,118 
elliptical see Elliptical copula 
extreme value see Extreme value 
copula 
Fréchet-Hoeffding bounds 106 
survival 104, 114, 132, 140, 166, 215 
Correlation coefficient 2, 24,99, 105, 
147-154, 165, 173, 174, 189, 219, 
220 
Hoeffding identity 149 
Countermonotonicity 107, 149, 155, 
160, 164 
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Coup against Gorbachev 209, 276 
Covariance matrix 3, 24, 33 
CreditMetrics 138 
CreditRiskt 137 
CSFB/Tremont index 167 
Currency 
British Pound 208, 210 
Euro 195, 203, 217 
German Mark 197-199, 208-210, 
214, 215, 276 
Japanese Yen 197, 199, 208, 215, 217 
Malaysian Ringit 208, 210, 214 
Swiss Franc 208-210, 214, 276 
Thai Baht 208, 210 
US Dollar 208-210, 240, 276 


Default risk 100, 137, 180 
Dependence 2,101 
mutual complete 101 
positive orthant 119,164 
Dependence measure 147, 161 
Dependence metric 162 
Dependence structure see copula 
Derivative see Option 
Digital option 131 
Distribution function 
Exponential 58, 90,175 
Fréchet 45, 255 
Gamma __ 58, 90, 175 
Gaussian 2, 37, 148, 169, 175, 233 
Generalized Pareto 39, 44-47, 116 
GEV 45, 47,116 
Gumbel 45, 48 
Lévy stable law 2, 39, 42, 148, 243 
Log-normal 37, 60, 78, 150 
Log-Weibull 60, 69, 91 
Meta-elliptical 109 
Meta-Gaussian 108 
Modified-Weibull 128 
Pareto 39,57, 64, 88 
Pearson type-VII 42 
Shifted-Pareto 126 
Stretched exponential 43, 50, 57 
Student t 42,108, 157, 175, 233 
Weibull 50, 57, 67, 88 
Diversification 19, 180 
Dow Jones Industrial Average Index 
44, 53, 62, 78 
Drawdown 23, 36, 79, 276 


Dual copula 104,118 
Efficient market hypothesis 20 
Elliptical copula 107, 196, 200 
Gaussian copula 108 
Kendall’s tau. 157 
simulation 120 
Student’s copula 109 
tail dependence 172 
Empirical copula 190 
Endogeneity 276 
Euro 195, 203, 217 
European Monetary System 210 
Evolutionary stable equilibrium 20 
Exogeneity 277 
Expectation-bounded measures of risk 
7 
Expected utility 4, 21,165 
Expected-Shortfall 47,79 
Exponential distribution 58, 90,175 
Extremal index 46 
Extreme value copula 116,117 
Extreme value theory 43, 45, 254, 273 


Factor model 3, 19, 24, 29, 111, 138, 
174, 233, 238, 255 
Fat tail see Heavy tail 
Federal Reserve Board 208 
Firm size 37 
Foreign exchange rate 215 
Fréchet distribution 45, 255 
Fréchet-Hoeffding bounds 106,115, 
117,119 
Fractality 80 
Fractional Brownian motion 81 
Frailty model 113 
Frank’s copula 112 
Kendall’s tau 156 
simulation 123 
tail dependence 171 
Friendship theorem 26 


Gain-loss ratio 13 

Gamma distribution 58,90, 175 

GARCH _ 35, 37, 43, 108, 205, 217, 219, 
231 

Gaussian copula 108, 128, 130, 131, 
135, 137, 204, 212, 217 


Gaussian distribution 2,37, 148, 169, 
175, 233 

General deviation measures 8 

Generalized Extreme Value distribution 


45, 47, 116 

Generalized Pareto distribution 39, 
44-47, 116 

German Mark 197-199, 208-210, 214, 


215, 276 

Gini’s gamma _ 161, 165 
Girsanov theorem 144 
Gnedenko theorem 45, 76 
Goodness of fit 61, 164, 189 
GPBH theorem 115 
Great Depression 53 
Gumbel distribution 45, 48 
Gumbel’s copula 112,117 

Kendall’s tau. 156 

simulation 123 

tail dependence 171 


Heavy tail 2,12, 15, 36, 38, 42, 57, 157 
Heteroscedasticity see Volatility 
clustering 


High frequency data 35, 37, 44 
Hill estimator 43, 48, 64, 240 
Hoeffding identity 149 


Ibov index 240 

Incomplete gamma function 57 
Inflation VII 

Information matrix 

Fisher 198, 201, 222 
Godambe 202 

Internet bubble VII, VIII 
Invariance theorem 105 

Ipsa index 240 


Japanese Yen 197, 199, 208, 215, 217 


Kendall’s tau. 154, 165, 196, 200, 249 
Archimedean copula 155 
elliptical copula 157 

Kernel estimator 192 

King = see Outlier 

KMV_ 138 

Kolmogorov distance 61 


Kullback-Leibler divergence 61,163 


Lévy process 35 
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Lévy stable law 2, 39, 42, 148, 243 
Lambert function 172 
Laplace transform 113 
Latin American crisis 228, 260 
Argentinean crisis 231, 233, 240, 261 
Mexican crisis 230, 233, 240, 247, 
261 
Linear dependence 
coefficient 
Local correlation coefficient 151 
Log infinitely divisible process 41 
Log-normal distribution 37,60, 78, 150 
Log-Weibull distribution 60,69, 91 
LTCM 24 
Lunch effect 53 


see Correlation 


Malaysian Ringit 208, 210, 214 
Market crash VII, 23, 36, 38 

April 2000 230 

October 1987 VII, VIII, 26, 230, 247 
Market index 

CSFB/Tremont 167 

Dow Jones Industrial Average 44, 

53, 62, 78 

Ibov 240 

Ipsa 240 

Merval 240 

Mexbol 240 

Nasdaq Composite 53, 77, 230 

Standard & Poor’s 500 39, 60, 129, 

167, 177, 216 

Market liquidity 6, 41 
Market trend 231, 234, 253, 255 
Markowitz’ portfolio selection see 
Mean-variance portfolio theory 
Maximum domain of attraction 45,46 
Mean-variance portfolio theory VIII, 
33, 38, 58 
Merton model of credit risk 20, 137 
Merval index 240 
Meta-elliptical distribution 109 
Meta-Gaussian distribution 108 
Mexbol index 240 
Micro-structure 189,215 
Minimum option 135 
Minority game 22 
Mixture model 137 
Modified- Weibull distribution 
Monte Carlo 120,192 


128, 131 
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Multifractal Random Walk 39, 84, 
210, 219, 278 
Mutual complete dependence 101 


Nasdaq Composite index 
New economy 230 
Normal law see Gaussian distribution 


53, 77, 230 


Occam’s razor 212 
Option 33,100, 131, 192 
digital 131 
minimum 135 
rainbow 135 
Outlier 23, 30, 36, 80, 206, 209, 210, 
276 


Pareto distribution 39,57, 64,88 
Pearson estimator 148, 157 
Pearson type-VII distribution 42 
Pickands estimator 43, 47-49 
Portfolio 3, 100, 179, 189, 200, 216 
analysis 205 
insurance 14, 23 
management VIII, 180, 212, 217, 231 
risk 3,33, 124, 127, 128, 177, 192 
theory 3,33 
Positive orthant dependence 
Archimedean copula 166 
Pseudo likelihood 197, 215, 222 
Pseudo-sample 197, 206 


119, 164 


Quantile 35 


Rainbow option 135 
Regular variation 39,171 
Risk VI, VIII, 1 
analysis 13 
assessment 78, 124, 212, 231 
aversion 4, 16,17 
management VIII, 35, 79, 100, 205, 
271 
measure 10 
coherent measures of risk 4, 276 
consistent measures of risk 7 
expectation-bounded measures of 
risk 7 
general deviation measures 8 
spectral measures of risk 6 
premium 37 
Russian crisis 230 


Securization 100 


Self-organized criticality 277 

Self-similarity 80 

Semi-invariant 10 

Shannon entropy 163 

Sharpe’s market equilibrium model 
see Capital asset pricing model 

Shifted-Pareto distribution 126 

Sklar’s theorem 104, 107, 120, 190 

Spearman’s rho 159, 165, 196, 248 

Spectral measures of risk 6 

Standard & Poor’s 500 39, 60, 129, 
167, 177, 216 

Stone-Weierstrass theorem 192 

Stress testing 10, 43, 210, 214 

Student t¢ distribution 42, 108, 157, 


175, 233 

Student’s copula 109, 117, 131, 195, 
212, 256 

Survival copula 104, 114, 132, 140, 166, 
215 


Swarm intelligence 22 
Swiss Franc 208-210, 214, 276 
Swiss National Bank 210 


Tail dependence 
233, 254 

Archimedean copula 170 
elliptical copula 172 
factor model 174 

Tail risk 124, 131, 177, 216 

Thai Baht 208, 210 

Theorem 
central limit 
friendship 26 
Girsanov 144 
Gnedenko 45, 76 
GPBH 46,115 
invariance 105 
Sklar 104, 107, 190 
Stone-Weierstrass 192 
Wilks 71, 93, 216, 224 
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US Dollar 208-210, 240, 276 


Value-at-Risk 2,5, 43, 46, 79, 100, 124, 
128, 131, 168, 271, 276 
Volatility 2,231 
clustering 35, 38, 39, 43, 194, 217 
Volume of transactions 41 


Weibull distribution 50, 57, 67, 88 
Wilks theorem 71, 93, 216, 224 


